Journal of Applied Psychology 


Joun G. Darcey, Editor 
University of Minnesota 





Table of Contents 


Vocational Interest Differences Among Engineers Employed in Different Functions: M. D. 
Dunnette 


Colors and Mood-Tones: D. C. Murray and H. L. Deabler 

Interests of Fathers and Sons: E. K. Strong, Jr 

Reported Driving Speeds and Previous Accidents: R. G. Stewart 

The Measurement of Creativity in Machine Design: W. A. Owens, C. F. Schumacher, and J. B. 


Personality Dynamics and Accident Proneness in an Industrial Setting: A. Davids and J. T. 


The Relationship Between the Edwards Personal Preference Schedule Variables and the 
Minnesota Multiphasic Personality Inventory Scales: R. M. Allen 


An Investigation of Dissimulation on the MMPI by Means of the “Lie Detector”: 
Calvin and C. Hanley 


An Adjective Check List for the Study of “Product Personality”: W. D. Wells, F. J. Andriuli, 
F, J. Goi, and S. Seader 


An Experimental Test of the Effects of “Developmental” vs. “Free” Discussions on the 
Quality of Group Decisions: N. R. F. Maier and R. A. Maier 


Estimation of the Reliability of Average of Rankings: H. A. Edgerton 


A Comparison of Two Modes of Prosthetic Prehension Force Control by Arm Amputees: Hilde 
Groth and J. Lyman 


Rate of Force Application in a Simple Reaction Time Test: E. T. Klemmer 
Positive and Negative Faking on a Forced-Choice Authoritarian Scale: W. A. Kaess and S. L. 


An Easier “Male” Mechanical Test for Use with Women: W. G. Mollenkopf 





American Psychological Association 


Volume 41, Number 5 October, 1957 





Consulting Editors 


Harowp E. Burtt, Ohio State University 

Atpnonse Cuapanis, Johns Hopkins Uni- 
versity 

Cirrrorp E. Jurcensen, Minneapolis Gas 
Company 

Laurence S. McGaucuran, University of 
Houston 


Quivn McNemar, Stanford University 


—— = Mintz, City College of New 

or. 

Harotp F. Rorue, Fairbanks, Morse and 
Company 

Juitan B. Rotter, Ohio State University 

Tuomas A. Ryan, Cornell University 

Donatp E. Super, Columbia University 

Muzzes A. Tinker, University of Minnesota 

Atrrep C. WELCH, University of New 
Mexico 


Artuur C. Horrman, Managing Editor 
Hexen One, Assistant Managing Editor 
Editorial Staff: Saran Womack, Frances H. Crank, Barsara Cummunos, Sante J. Dovie 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical and consulting psychology, al- 
though a descriptive or theoretical article may be 
accepted if it represents a special contribution in 
an applied field. Quantitative investigations of in- 
terest or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; bio- 
mechanics; industrial working conditions; research 
on opinion and morale factors; job analysis and 
classification research; market and advertising re- 
search, 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 


“brevity consistent with clarity.” The typical 
manuscript should run to approximately 4,000 
words, There is a lag of approximately twelve 
months between receipt and publication of an 
article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 


Manuscripts should be addressed to the Editor, 
John G. Darley, 408 Johnston Hall, University of 
Minnesota, Minneapolis 14, Minnesota. All manu- 
scripts should be submitted in duplicate. Original 
figures are prepared for publication; duplicate fig- 
ures may be photographic or pencil-drawn copies. 


Manuscripts must conform to the style require- 
ments described in the Publication Manual of the 
American Psychological Association. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$8.00 per volume 


$1.50 per issue 


Subscriptions, orders, and business communications should be addressed to the American Psychological Association, 


1333 Sixteenth St. N.W., Washington 6, D. C. Address 


must reach the subscription office by the 10th of 


the month to take effect the following month. Undelivered copies resulting from address changes will not be replaced; 
subscribers should notify the post office that they will guarantee second-class forwarding postage. Other claims for 
undelivered copies must be made within four months of publication. 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879. 
Acceptance for mailing at the special rate of postage provided for in paragraph (d-2), Section 34.40, P. L. & R. 


of 1948, authorized October 10, 1947, 


© 1957 by the American Psychological Association, Inc. 





Journal of Applied Psychology 











VoL. 41, No. 5 


OcTOBER, 1957 








Vocational Interest Differences Among Engineers Em- 
ployed in Different Functions ' 


Marvin D. Dunnette 


Minnesota Mining & Manufacturing Co., St. Paul 


Present demands for engineering, scientific, 
and other technically trained manpower have 
outstripped current supplies. Because of this, 
problems of guiding larger numbers of persons 
into scientific training and the expansion of 
current training facilities have received wide- 
spread attention. An area which is probably 
of equal importance but which has received 
less attention is the problem of increasing the 
efficiency of utilization of current technical 
manpower resources. 

Technical personnel undertake a wide va- 
riety of functions. Persons with scientific 
training are performing jobs ranging from 
routine analysis and testing to top-flight ex- 
ecutive jobs. They also are employed in func- 
tions ranging from basic research to applied 
development, production and process engi- 
neering, and even customer contact and sell- 
ing. If placement in these various functions 
is achieved by methods of trial and error 
or other personnel techniques of questionable 
validity, serious waste and misapplication of 
technical manpower is the unwanted result. 

Since engineers and scientists are perform- 
ing and are continually being called upon to 
perform so many different industrial duties, 
it is reasonable to expect that effectiveness in 
these various functions will be related to im- 
portant individual differences in patterns of 

1 Research reported in this article is part of a large- 
scale effort undertaken by the Industrial Relations 
Center at the University of Minnesota to’ develop 
special scoring keys on the Strong Vocational Interest 
Blank for four engineering functions. Research in 
this project has been supported, in part, by continu 
ing grants from the Graduate School, University of 
Minnesota. The author wishes to express special 
thanks for advice and assistance received from Pro- 
fessor Donald G. Paterson and Dr. George W. Eng 
land of the Industrial Relations Center and Mr. Ken 


neth Rose who is responsible for the discriminate 
function analyses described in this paper. 


abilities, interests, and personality traits. Dis- 
covery of such patterns can aid materially in 
fostering better utilization of our scientific 
manpower resources. This article describes 
first steps taken in an effort to develop special 
scoring keys on the Strong Vocational Inter- 
est Blank which may be used eventually as 
aids in more appropriately guiding students 
through their engineering education as well as 
in the placement of engineering and other sci- 
entific applicants in industry. 

Impetus for this study came from the highly 
relevant research performed by Webster, Winn 
and Oliver (2) in 1950. They administered 
the Strong Vocational Interest Blank to groups 
of 34 sales engineers and 52 research engi- 
neers. These two groups differed considerably 
in the expected direction on the Technical 
Science and Sales keys of the Strong test. It 
was also noted that research engineers who 
were rated as most successful scored higher 
on all keys of Groups I and II (Human and 
Technical Sciences), except Osteopath, than 
did research engineers who were rated as less 
successful. These findings suggest, then, that 
measures of vocational interest can prove use- 
ful in engineering job placement and perhaps 
even in the prediction of different levels of en- 
gineering job effectiveness. 


Methods 


During the last two and one half years, the Strong 
Vocational Interest Blank (SVIB) has been adminis 
tered to over 1,600 engineers employed in 15 firms 
In the course of collecting these data, the Strong 
test was administered to 238 engineers, scientists, 
and other technically trained persons employed with 
Minnesota Mining and Manufacturing Co. It was 
decided to examine the Strong Vocational Interest 
profiles of these 3M technical persons in order to de- 
termine the relative feasibility of continuing an item 
analysis of the major portion of the data 





274 Marvin D. 


The research design of this study is simple. On 
the basis of job analyses (1), the 238 3M technical 
people were divided into groups comprising pure re- 
search scientists, applied research and development 
engineers, production engineers, and sales and tech- 
nical service engineers. Each of these functional 
groups was subdivided further to provide validation 
and cross-validation samples. The groups formed 
and the numbers in each are summarized in the table 
below 

Cross 
Validation Validation 
Sample Sample 
28 20 


Function 


Pure research scientists 
Applied research and 
development engineers 29 20 
Process and production 
engineers 50 30 20 
Sales and technical 
service engineers 


91 66 25 


Strong Vocational Interest profiles for persons in 
the validation samples were examined in order to 
develop scoring keys for the four functions which 
would accurately place engineers in the’ cross-valida- 
tion samples into their appropriate functional cate- 
gories. y 

Two methods were investigated in the effort to dis- 
criminate among the four functions on the basis of 
SVIB scores: 

1. Discriminate function analysis was used to write 
equations which could be used to score the interest 
profiles. This was done by using the means of each 
of the 11 occupational groups on the Strong as in- 
dependent variables in predicting the dependent vari- 
able of engineering function. 

2. Scoring weights were assigned to each of the 44 
occupational scale scores of the SVIB. Weights were 
chosen which tended to maximize differences among 
the four engineering functions. Using this method, 
four scoring keys were developed, one for each of 
the engineering functions. Score distributions for 
each key were converted to standard score form to 
allow ready comparison among individual scores 

Each of the above methods of analysis was “tested” 
by application to the profiles of persons in the cross 
validation samples. 


* Discriminate function analysis took the form of 


and development groups and the combined production and sales groups. 


Dunnette 


Finally, in order to test the extent of validity gen- 
eralization for the newly developed scoring keys, an 
independent sample of 60 technical persons was se- 
lected and the scoring keys applied to their SVIB 
profiles. 


Results 


Table 1 gives the mean standard scores and 
corresponding letter scores obtained by each 
of the four functional groups on each of the 
occupational scales of the Strong Vocational 
Interest Blank. It is evident upon examining 
information in this table that persons in these 
four groups score differently on various parts 
of the Strong test. For example, pure re- 
search scientists appear to have interests some- 
what more similar to those of persons em- 
ployed in basic scientific and theoretical areas, 
such as medicine, physics, mathematics and 
psychology. Development engineers show the 
same tendencies, although to a less striking 
degree, and also appear to be relatively more 
similar to persons engaged in application of 
scientific principles to areas such as engineer- 
ing and production management. Sales and 
technical service engineers, as might be ex- 
pected, score highest in selling occupations 
and in areas involving independent business 
management such as Pharmacist and Morti- 
cian. Production engineers appear to possess 
a sort of hybrid interest pattern lying be- 
tween those of development and sales persons. 
This is an expected finding since production 
engineering involves activities cutting across 
both application of technical facts and work- 
ing closely with other persons. 

Table 2 shows the accuracy of placement 
achieved by applying the discriminate func- 
tions developed in the validation samples to 
the persons in the cross-validation groups.* 


first discriminating between the combined research 


Two additional functions were 


written in order to discriminate between research and development and between production and sales, re- 


spectively 
of the Strong Vocational Interest Blank (for example, 
of Group I on the Strong Vocational Interest Blank). 
Between A & B: Z = Xi, —3.5X2— 8X5 


Between 1 & 2: Z 5Xi+ Xz 


The functions thus developed are shown below. 


2.0X.+2.3Xs 
Cutting score: 
1.4X5+ 1X. 4+ 1.2Xe— .1X:4 


The subscripts refer to the occupational area 


X, denotes the mean of scores obtained on the scales 


1.7X6 — 6X; + 1X6 . 
3.63 


2.8X0 + 14X10 


SXa— 3Xet+ 3Xw t+ .7Xu 


Cutting score: 37.16 


Between 3 & 4: Z = Xi+-4.2X, 


2.4X—+ 9X4 —- 4Xe+ 3.2X0— Xz 


3.1X4 + 10.8X—— 6.0Xw — 4Xu 


Cutting score: 169.70 


Key: Exp. Group 1—Research; Exp. Group 2—Development; Exp. Group 3—Production; Exp. Group 4— 
Sales; Exp. Group A—Groups 1 and 2 combined; Exp. Group B—-Groups 3 and 4 combined. 
Note 


Entries in the above equations should be in the form of mean occupational group standard scores 





Vocational Interest Differences Among Engineers 


Table 1 


Mean Occupational Scale Scores on SVIB Profile Obtained by Four Technical Groups 
Engaged in Different Functions 


Mean Standard Scores Corresponding Letter Scores 


Pure Applied Pro Pure Applied Pro 
Research Research duction Sales Research Research duction Sales 
(N=28) (N=29) (N=30) (N=66) (N = 28) (.N = 29) (AN w (N = 66) 


Artist*** 23 19 14 14 
Psychologist*** 38 32 24 22 
Architect*** 28 29 19 16 
Physician*** 35 31 25 24 
Osteopath* 31 29 26 0) 
Dentist*** 28 27 21 16 
Veterinarian*** 20 21 19 27 
Mathematician*** 29 25 16 12 
Physicist*** 31 26 15 
Engineer*** 38 41 32 
Chemist*** 42 39 27 
Production Manager*** d 45 44 
Farmer . 38 35 
Aviator** 3: 41 34 
Carpenter** 27 27 
Printer*** : 35 
Math. Phys. Sci. 

Teacher*** 42 46 
Ind. Arts Teacher** 25 28 
Voc. Agric. Teacher** 
Policeman* 33 
Forest Service Man 
YMCA Phys. Director*** 24 
Personnel Director* 3 44 
Public Administrator 
YMCA Secretary*** 
Soc. Sci. H.S. Teacher*** 
City School Superin 

tendent** 
Minister*** 
Musician*** 
CPA 27 
Senior C.P.A : . 41 
Accountant* : 36 
Office Man** ! 36 
Purchasing Agent*** 
Banker** 27 32 
Mortician*** 2 28 32 
Pharmacist*** 32 
Sales Manager*** d 29 34 
Real Estate Salesman*** 34 39 
Life Insurance 

Salesman*** 22 28 
Advertising Man*** f 27 
Lawyer*** 24 25 
Author-Journalist*** 29 26 
President Mig. Concern** 37 26 


* F ratio significant at 5% leve ** F ratio signifi gnificant at 





Marvin D. Dunnette 


Table 2 


Accuracy »! Placement Achieved in Cross-Validation 
Groups by Application of Discriminant Func 
tions Derived from Occupational 


Area Means of SVIB 
(r, = 84. Over-all accuracy = 66%) 


Actual Job Function 


Job Function 
Determined 
by Equations 


Devel- Pro 
opment duction 


Pure Re- 


search Sales 


Pure Research | 14 | 2 3 1 


Development 5 13 6 











Production 1 ) 10 








Sales 1 





Totals 20 


A quick glance at the results shown in 
Table 2 shows that use of discriminate func- 
tions resulted in considerably better than 
chance accuracy in the placement of engi- 
neers in their appropriate functional cate- 
gories. 

Table 3 shows the accuracy of placement 
achieved by the special scoring keys which 
were developed.* The data shown in Table 3 
yield an y* of 95.3 and tetrachoric r of .84 
both of which are highly significant statisti- 
cally. Analysis of misplacements gives fur- 
ther evidence of the accuracy with which these 
scoring keys pinpoint the appropriate func- 
tional category. Of the 27 persons misplaced 
by application of the scoring keys, 18 were 
placed in the functional category directly ad- 
jacent to the correct category. It may be 
concluded, therefore, that measured interests 
can give important clues concerning the place- 
ment of a technical person into the broad 
range of duties most compatible with his likes 
and dislikes. 

It will be noted from Tables 2 and 3 that 
the group least effectively identified is the 

8 In each case a man was assigned to the functional 
category for which he received the highest standard 
score. For example, in applying the four keys to a 
man’s vocational interest profile he might obtain the 
following four scores: Research—71; Development— 


66; Production—47; Sales—38. In this case, he 
would be classified as a research engineer. 


production engineering group. This is fur- 
ther evidence of the hybridization of interests 
mentioned previously. Production people ap- 
pear to be somewhat less unique than the 
other groups tested. This contributes un- 
doubtedly to the relatively greater inaccuracy 
with which they may be identified on the ba- 
sis of vocational interests. 


Validity Generalization 


A crucial test of the utility of these keys 
in their application to subsequent persons is 
whether or not they may accurately place per- 
sons from independently selected groups into 
their appropriate functional categories. Al- 
though a partial test of this is supplied by 
analysis of the cross-validation samples dis- 
cussed above, this supplies merely a test of 
the significance of the statistical relationships 
involved and gives little evidence concerning 
the accuracy of the keys applied on a broader 
scale to persons in different departments, 
laboratories, or companies. Because of this, 
it was deemed necessary to test these keys on 
independently selected samples. The Strong 
Vocational Interest Blank was, therefore, ad- 
ministered to 60 additional technical persons. 
The number of persons employed in each of 
the four functions was as follows: Pure Re- 
search—3; Development—13; Production— 
26; Sales—18. Persons included in the sales 
engineer group were made up of applicants 
for employment with 3M who had previously 
been working in sales engineering and who 
were currently applying for a job of this sort 
with 3M. 

Two methods were used to determine the 
accuracy of the keys in their application to 
persons in this sample: 

1. Distributions of scores on each of the 
keys for each of the groups were obtained and 
are shown in Table 4. 

2. The extent to which these keys accu- 
rately assign each person to his appropriate 
functional category was examined. These 
data are shown in Table 5. 

Study of information contained in these two 
tables shows that the keys do an effective and 
accurate job of assigning engineers to their 
appropriate functional groups. Mean scores 
on the various keys differ significantly for 





Vocational Interest Differences Among Engineers 


Table 3 


Accuracy of Placement Achieved in Cross-Validation Groups by Application of Scoring Keys Based on 
Occupational Scale Scores of the SVIB 


(r, = 8A. 


Over-all accuracy = 68%) 


Actual Job Function 


Job Function 
Determined 
by Keys 


Pure Re 
search 


Pure Research 


Devel Pro 
opment 


duction Sales Totals 


23 





Development 3 





23 





Production 


Sales 


Totals 


persons in different functions. This is shown 
clearly in Table 4, and the practical signifi- 
cance of these differences is emphasized by 
the over-all accuracy of placement of 62%. 
Data in Table 5 yield a tetrachoric correla- 
tion coefficient of .70. Validity generaliza- 
tion shown by these special keys is particu- 
larly impressive when it is noted that this 


sample was unduly weighted with production 
engineers, the group which shows the least 
distinctive pattern of vocational interests and 
almost entirely lacking in pure research peo- 


15 


18 24 


25 &5 











ple, the group showing the most distinctive 
pattern of vocational interests. 


Implications 


Major implications of this study are im- 
mediately apparent. It has been possible by 
examining the interest profiles of persons per- 
forming different technical and engineering 
functions to develop special scoring keys 
which accurately discriminate among persons 
in these four major functions. Interest meas- 
urement may, therefore, prove to be a power- 


Table 4 


Mean Scores and Standard Deviations Obtained by Engineers Performing Different Functions 
on the Four Special SVIB Scoring Keys 


Pure Research 


Key 
Mean 


Pure Research Scientists 
(N =3) 

Development Engineers 
(N = 13) 

Production Engineers 
(N = 26) 

Sales Engineers 
(N = 18) 

F ae 


59.0 


53.8 


46.4 


* Significant at 5% level. 
** Significant at 1% level 
*** Significant at .1% level 


Development 


Mean SD 


54.3 


47.2 


Production 


Key Key Sales Key 


Mean SD Mean SD 


4.00 36.3 2.58 41.3 6.14 


8.36 44.0 10.68 4.97 


8.19 53.3 &.44 53.3 6.50 


6.61 8.27 59.8 8.13 


3.82° 11.16*** 





Marvin D. Dunnette 


Table 5 


Accuracy of Placement Achieved in Validity Generalization Sample by Application of Scoring Keys 
Based on Occupational Scale Scores of the SVIB 


(ry, = .70. 


Over-all accuracy = 62%) 


Actual Job Function 


Job Function 
Determined 
by Keys 


Pure Re 


search 


[ 


Pure Research 


Devel Pro 
opment 


duction Sales Totals 





Development 1 








Production 


Sales 


Totals 


ful tool in more appropriately guiding stu- 
dents who are undertaking technical training 
and may also provide important clues to per- 
sonnel administrators or employment man- 
agers in more appropriately placing engineer- 
ing applicants on jobs on which they may be 
most productive and satisfied. 

Since the relatively coarse keys developed 
in this study, based as they are on occupa- 
tional scale scores and on a rough system of 
weighting, have proved effective in this fash- 
ion, it is likely that item analyses now under 
way on the Strong Vocational Interest Blank 


should provide even more powerful guides to- 


ward discriminating among engineers engaged 
in different functions. Broader recognition of 
differences existing among different kinds of 


engineers and increasing knowledge concern- 











18 


ing the measurement of these differences 
should then form the basis for better engi- 
neering utilization. As research in these areas 
continues to accumulate, personnel managers 
should be able to do a better and better job 
of insuring an even flow of the right engi- 
neers into the right jobs. This, in turn, should 
reduce manpower losses due to costly meth- 
ods of trial and error and other personnel 
techniques of questionable validity. 


Received February 5, 1957. 


References 


1. Dunnette, M. D., & England, G. W. 
for differentiating engineering jobs. 
Psychol., 1957, 10, 191-198. 

2. Webster, E. C., Winn, A., & Oliver, J. A. Selec- 
tion tests for engineers: some preliminary find- 
ings. Personnel Psychol., 1951, 4, 339-362 


A checklist 


Personnel 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


Colors and Mood-Tones ' 


David C. Murray and Herdis L. Deabler 


Gulfport Division, VA Center, Biloxi, Mississippi 


Colors and mood-tones are frequently 
thought to be associated. The assumption of 
such association is basic to much of Rorschach 
interpretation, the interpretation of the chro- 
matic HTP and other projective techniques. 
It plays a part, implicit or explicit, in ad- 
vertising, packaging, designing, and other 
areas of commerce and has an important 
bearing on psychological effects of colors se- 
lected for interior decoration of rooms for liv- 
ing and working quarters. 

In a study on the relation between colors 
and mood-tones, Wexner (5) selected a list 
of eleven mood-tones. Each mood-tone con- 
sisted of two or more words which had been 
unanimously chosen by four judges as syn- 
onymous. The mood-tones were: Exciting, 
stimulating ; Secure, comfortable; Distressed, 
disturbed, upset; Tender, soothing; Protec- 
tive, defending; Despondent, dejected, un- 
happy, melancholy; Calm, peaceful, serene; 
Dignified, stately; Cheerful, jovial, joyful; 


Defiant, contrary, hostile; and Powerful, 
strong, masterful. (Each mood-tone will 


hereafter be referred to by its initial word.) 
The subjects, 94 Purdue students, half of 
whom were male and half female, were asked 
to select the one color from the colors on the 
charts that they felt best represented the feel- 
ings described by the word groups. Colors 
could be used more than once or not at all. 
Eight colors, yellow, orange, red, purple, 
brown, blue, black, and green, were used. To 
avoid color stereotypes, the names of the 
colors were not mentioned. Results indicated 
that for each mood-tone certain colors were 
chosen to “go with” that mood-tone signifi- 
cantly more often than the remaining colors. 
Wexner raised the question of cultural and 
biological determinants of these associations 
of mood-tones and colors. The present study 
is an attempt to explore the cultural (more 
1An abbreviated version of the paper was pre- 


sented at the 1956 convention of the American Psy 
chological Association, Chicago, Ilinois. 


279 


exactly, the subcultural) factor to see whether 
socioeconomic level and mental health play 
a role in such findings, and to see what as- 
sociations are consistent despite variations in 
subculture, socioeconomic level, and mental 
health. 

Procedure 


To obtain a cross-regional comparison with Wex- 
ner’s Purdue students, 
students 


25 Louisiana State University 
18 males and 7 females—were tested. It 
was assumed that age, intelligence, and socioeconomic 
status would he fairly between the stu- 
dents of the two universities. Since Wexner had 
found no significant differences between sexes for 
her Purdue group, results for the L.S.U. students 
were also lumped together. To determine whether 
socioeconomic status was a factor, the test was ad 
ministered to 69 male nursing assistants at the Gulf 
port Division of the Biloxi VA Center, to provide a 
comparison between relatively high and_ relatively 
low socioeconomic groups within the South. Al 
though no statistics are available, the average age of 
the nursing assistants was probably somewhat ercater 
than that of the students, and the age range was 
obviously greater Intelligence, which is probably 
correlated to socioeconomic status, may be assumed 
to be considerably higher for the students 

To determine whether findings were applicable to 
neuropsychiatric patients the test was also given to 
108 unselected new male admissions to the Gulfport 
(neuropsychiatric) Division of the Biloxi VA Cen- 
ter, The vast majority of these patients are of 
southern origin and a majority are of a sociveco 
nomic origin similar to that of the nursing assistants 
Average intelligence of the patients may be some 
what higher than that of the nursing assistants, as 
may average socioeconomic origin, but it is felt that 
the differences are probably not very great Thus 
results for nursing assistants and patients provide a 
comparison of two southern groups of roughly simi 
lar socioeconomic origin 

The mood-tones, instructions, and pro- 
cedure used were the same as for Wexner's group 
with two unavoidable exceptions. Wexner put her 
colors, in the form of 84 X 11-in. pieces of art paper, 
on 30 X 40-in. pieces of light-gray cardboard. Due 
to the arrangements of the rooms where testing was 
to take place, it was necessary in the present study 
to mount all eight pieces of art paper on one piece 
of 28 © 44-in. cardboard in a 2 ¥ 4 arrangement 
Also, it was not possible to match precisely two of 
the colors used. Both the red and the blue in the 
present study were slightly lighter than were Wex- 


consistent 


colors, 





David C. Murray and Herdis L. Deabler 


Table 1 


Correlations Showing the Extent to Which Groups Agree in Selecting Various Colors as 


“Going With” the Given Mood-Tones 


Groups Compared* 


Pat. 


Mood LSU 


35 
96 
94 
79 


70 
85 
34 
71 
Al 
ae 
9s 
54 
58 
26 
4% 


Exciting 
Secure 
Distressed 
Tender 
Protective 
Despondent 
Calm 
Dignified 
Cheerful 
Defiant 
Powerful 


40 
87 
54 

95 
41 
&3 


Key. Pat: Neuropsychiatric patients. 
University students. 


Pat 
Pur. 


65 
88 
39 
78 
35 
32 
72 
59 
40 


26 


Att: Nursing assistants. 


Att. 
LSU 


All 
Groups 


38 
46 
A7 
79 
59 
04 
4 
51 
55 
84 
62 


55 93 
34 
37 
97 


.66 


66 
88 
40 
84 
65 
62 
87 
68 
78 
61 
4 


— 25 
85 
74 
76 
AY 
AY 
41 
ao 
60 


AO 
95 
BY 
78 
53 


21 


LSU: Louisiana State University students. Pur: Purdue 


Comparisons between pairs of groups are by rank-order coefficients of correlation (Rho), those between all four groups at 


once are by coefficients of concordance (W), 
W are significant beyond the .01 level. 


ner’s colors. These differences, though probably of 
minor importance, should be kept in mind when 
comparing the results from the Purdue group with 
results from the other three groups. 

All testing was done in groups of varying sizes, 
the 25 L.S.U. students at one time, the nursing as- 
sistants in five groups ranging from 4 to 21 at a 
time, and the patients in thirteen groups of from 4 
to 12 at a time. 


Results 


To gain an over-all view of the relation- 
ships between the groups, and to see if they 
did actually differ from one another, chi 
squares were computed for each mood-tone. 
Colors were grouped into a _ miscellaneous 
category to give expected frequencies of five 
or more. For ten of the mood-tones p was 
smaller than .01, indicating that for these 
mood-tones the groups did differ in their 
choice of colors. For the mood-tone protec- 
tive, p was greater than .05. In general, it 
appeared that for most of the mood-tones, 
there were significant differences between the 
groups in their choice of colors to go with the 
given mood-tone. 

This finding left unanswered the questions 
of which groups agreed most closely in their 
choice of colors to go with mood-tones, and 


Italicized Rho are significantly larger than chance, allowing a { ratio of 3.0. 


Italicized 


for which mood-tones did we have the closest 
intergroup relationships. An attempt was 


made to answer this by rank-ordering the 
colors chosen by each group for a given mood- 


tone and then obtaining rank-order correla- 
tions between each pair of groups. These are 
shown in Table 1. 

As might be expected, the two university 
groups show the largest number of significant 
correlations; the two southern groups, of 
roughly similar socioeconomic status, show 
the next largest number. There is less simi- 
larity between the two southern “normal” 
groups of differing socioeconomic level, sug- 
gesting perhaps that socioeconomic level is a 
more important factor to consider than either 
region of the country or level of mental health. 

Table 1 also indicates that for three of the 
mood-tones the correlations between all pairs 
of groups are significant. These mood-tones 
are Secure, Tender, and Calm. To determine 
more clearly for which mood-tones there was 
the most agreement throughout all four 
groups, Kendall’s (2) coefficient of concord- 
ance was determined for each mood-tone. 
All coefficients but one were significant at 
beyond the .01 level. As can be seen from 





Colors and Mood-Tones 


Table 2 


The Percentage of Times Subjects Chose Each Color as “Going With’’ Each of the Eleven Mood Tones* 


Mood Tone 


Exciting, 
stimulating 


Secure, 
comfortable 


Distressed, 
disturbed 
upset 


Tender, 
soothing 


Protective, 
defending 


Purdue Students” 


Sig Color % 


A Red 4 

B Yellow 12 
Orange 11 
Green 


Blue 
Brown 
Green 


Yellow 


Orange 
Black 
Purple 
Brown 


Blue 

Green 
Yellow 12 
Purple 10 


Red 22 
Brown 18 
Blue 16 
Black 16 


Sig. 


A 
B 


Patients 
Color 


Red 
Yellow 
Green 
Blue 


Green 
Blue 
Yellow 


Brown 


Black 
Blue 
Red 
Yellow 


Blue 
Green 
Orange 
Other 


Green 
Brown 
Red 


Yellow 


Nursing Assistants 
Sig.* Color oy// 


A Red 

BK Black 
Orange 
Green 


Green 
Blue 
Brown 


Yellow 


Black 
Red 
Blue 
Yellow 


Green 
Yellow 
Blue 
Other 


Red 

Green 17 
Brown 14 
Blue 14 


S.U. Students 


Color % 
Red 
Yellow 
Purple 
Orange 


Blue 
Green 
Brown 


Yellow 4 
Black 40 


Brown 
Red 12 
Purple 8 


Blue AA 
Green 40 
Yellow & 
Purple 4 


Blue 28 
Brown 20 
Red 16 
Orange 12 


* To save space only the four colors most frequently chosen as ‘going with" each of the eleven mood tones are shown 
Her frequencies have been converted into percentages for comparative 


> Data for the Purdue students are from Wexner (5). 


purposes. 


© Sig.: Statistical significance. 
which is briefly described in the text. 
fidence) in the frequency with which they were chosen to ‘‘go with’ 
significantly more often as ‘going with" the given mood-tone than were colors in the B category 


the given mood-tone 


The letters in this column indicate the results of a multiple comparison test of significance 
Colors in the same letter category did not differ from each other (at the .05 level of con 
Colors in the A category were chosen 
Colors in the B category were 


chosen significantly more often as “going with’’ the mood-tone than colors in the C category, and those in the C category signifi 


cantly more often than those in the D category. 


Table 1, this coefficient is highest for the 
three mood-tones, Secure, Tender, and Calm. 
The coefficient is lowest for Distressed. 

To determine which colors were associated 
most often with a given mood-tone, results 
for each group were transformed into per- 
centages, which were in turn transformed into 


inverse sine scores (1). Tukey’s (4) proce- 
dure for making multiple comparisons among 
a set of observed means was adapted to make 
multiple comparisons among a set of observed 
frequencies in mutually exclusive categories 
(3) as was done by Wexner (5). A signifi- 
cance level of 5% was used in all cases. The 
results on the Purdue students were computed 
by Wexner (5). As was done by Wexner, 
colors were grouped according to the results 
of the multiple comparison tests. Colors in 


the same letter category were associated with 
mood-tone significantly more often than colors 
in letter categories below them and signifi- 
cantly less often than colors in letter cate- 
gories above them. Colors in the same letter 
category did not differ significantly from each 
other in frequency of association with mood- 
tones. 

Results (see Table 2) suggest that certain 
colors do have a general affective meaning for 
all groups, and also that one color may have 
much the same affective significance as an- 
other. In the present study the latter conclu- 
sion is particularly true of blue and green. 
Both blue and green are rather consistently 
related to Secure, Tender, and Calm, and 
rather consistently not related to Defiant. 
Red and black are the colors which are most 





David C. Murray and Herdis L. Deabler 
Table 2—Contlinued 


Purdue Students” Patients Nursing Assistants L.S.U. Students 


Mood Tone Sig’ Color % Color Color 


% Sige Color % 


Sig.* 


Black 


Brown 


Despondent, A 28 
28 
12 
12 


Black A 
Blue B 
Red d Cc 
Other 


Blue 43 A 
Black 22 B 
Purple 9 
Brown 7 


Black 
Purple 20 
Brown 20 
Yellow 4 


48 
dejected, 
B 


unhappy, Purple 


melancholy Blue 


Blue 40 
Green 33 Blue 

Yellow 9 Other 
Purple 7 Yellow 


‘alm, Green Green 48 
Blue 32 
Yellow 12 


Orange 4 


Green 
Blue 
Yellow 
Orange 


peaceful, 


serene 


Dignified, 48 


32, 


32 
24 


Brown Brown 
Purple 
Black 


Blue 


Purple 
Black 
Blue 
Brown 


Purple 
Blue 

Black 
Red 8 


stately Purple 
Red 


Yellow 


Cheerful, 


jovial, 


Yellow 
Red 
Orange 


Red 
Green 
Yellow 
Blue 


Yellow 
Red 32 
Orange 12 
Blue s 


Green 44 
Yellow 
Red 


Orange 


joyful 17 


12 


Green 


Defiant, Red 
Orange 
Black 


Brown 


Red 
Black 19 
Other 14 
Yellow 9 


34 Red 
Black 


Brown 
Other 


Red 48 
Black 28 
Purple 8 


contrary, 
hostile 
Orange 4 
Powerful, Black 
Red 
Purple 
Blue 


Red 


Green 


31 
14 
13 
11 


Red 
Brown 
Black 
Blue 


Red 


Brown 


32 
24 
20 
12 


strong, 
masterful Brown 


Black 


Purple 


Black 


consistently associated to certain mood-tones 
by all groups. Red is generally seen as Ex- 
citing, Cheerful, Defiant, and Powerful, and 
is generally not related to Secure, Tender, 
and Calm. Black is seen as Distressed, De- 
spondent, and Defiant, and is consistently not 
associated with Tender, Calm, and Cheerful. 
The major difference between red and black 
is that red is seen as Cheerful, black is con- 
sistently not chosen as Cheerful. 

Brown is Protective, but is not Exciting or 
Cheerful. Purple is Dignified, but unlikely 
to be seen as Secure or Cheerful. Yellow is 
Cheerful, but rarely seen as Powerful. Orange 
is not closely related to anything, but is rarely 
related to Despondent, Dignified, or Power{ul. 

In addition to these general findings, cer- 
tain instances where one or two groups showed 


orange as Distressed, and also as Defiant. 
The other three groups made these two asso- 
ciations very rarely. The two non-university 
groups were most likely to relate brown to 
Dignified, and the two university groups were 
relatively unlikely to do this. The non-uni- 
versity groups thought of green as a Cheerful 
color, the university groups did not. Half of 
the Purdue students picked black as Power- 
ful, and less than 15° of each of the other 
groups made this association. 

One question which arises is whether there 
is a tendency for certain groups to choose 
certain colors, regardless of the mood-tone in- 
volved. To answer this, a chi square was com- 
puted between groups and colors. It was sig- 
nificant at well beyond the .01 level. Two 
findings stood out. One was that patients are 


very strong associations between a color and 
a mood-tone, and other groups did not, may 
be noted. Thus the Purdue students saw 


particularly likely to associate colors that are 
not actually before them to the mood-tones, 
and to otherwise fail to follow directions. 





Colors and Mood-Tones 


Representative of some of the other answers 
they gave are: amber, gold, tan ivory, white, 
pale, and cream, and also such combination 
responses as: ‘“black-purple,” “gole (sic) 
brown,” “blue and orange,” “purple green 


gray black gray,” “deep blue blue black yel- 
and “Cool, Comb (sic) and Col- 


’ 


low trim,’ 
lected.” 

A second finding was that regardless of 
mood-tone, the Purdue students seemed to 
choose orange, purple, and black far more 
than did the other groups, and they chose 
red and green far less. Both patients and 
aides chose purple far less than the other 
groups. The aides chose green and blue far 
more than the other groups. 

For all groups put together, orange and 
purple were the least frequently chosen colors 
and red was by far the most frequently chosen 
color. This last finding may of course be a 
function primarily of the particular mood- 
tones used. 


Discussion 


The present data strongly suggest that peo- 
ple do associate colors and mood-tones in 
their minds. The range of these associations 
suggests that such associations are far from 
uniform from one person to the next. The 
fact that there are group differences in the 
associations strongly suggests that the asso- 
ciations are, at least in part, learned rather 
than inborn. It would appear that in both 
the commercial world and in projective test- 
ing the particular subgroup involved must be 
considered. Present results suggest that socio- 
economic differences are of particular impor- 
tance. 

It should be recognized that present re- 
sults apply only to one shade of each color. 
Further research is needed to determine to 
what extent results are specific to one shade 
of a color, and how far they can be general- 
ized to all shades of that color. Comparisons 
between more than two regions of the coun- 
try and between more finely differentiated 
socioeconomic groups would also be informa- 
tive. 


Summary 


Neuropsychiatric patients and nursing as- 
sistants in a southern hospital and students 
in a southern state university were presented 
with eight stimulus colors and a list of eleven 
moods and asked to pick a color to go with 
each of the moods. Results were compared 
with those for students in a northern univer- 
sity which have been previously published by 
Wexner (5). For nine out of the eleven 
mood-tones, chi squares revealed highly sig- 
nificant differences between the four groups 
in their over-all and 
mood-tones. In general, socioeconomic dif 
ferences appeared to be more important in 
causing differential choice of colors to go with 
mood-tones than were either mental health 
differences or differences in geographical re 
gions within this country. 

Certain colors were found 


associations of colors 


to have about 
the same affective meaning for all groups. 
In other instances there were sharp group dif- 
ferences in the extent to which they associ- 
ated a given color with a certain mood-tone 

It was found that certain groups seemed to 
have a general response tendency to choose 
certain colors, regardless of the mood-tone in 
volved. 

Finally, there were some colors which ap 
peared to be relatively popular with all groups 
and some which appeared to be relatively un 
popular with all groups. 


Received September 27, 19506. 


References 


. Eisenhart, C., Hastay, M. W., & Wallis, W. A 
Techniques of statistical analysis. New York 
McGraw-Hill, 1947 

. Kendall, M. G. Rank correlation methods 
don: Charles Griffin, 1948 

Nair, K. R The distribution of the extreme 
deviate from the sample mean and its stu 
dentized form. Biometrika, 1948, 35, 118-144 

. Tukey, J. W. Comparing individual means in the 
analysis of variance. Biometrics, 1949, 5, 99 
114 

. Wexner, Lois B. The degree to which 
(hues) are associated with mood-tones J 


appl. Psychol., 1954, 38, 432-435 


Lon 


color 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


Interests of Fathers and Sons‘ 


Edward K. Strong, Jr. 
Stanford University 


The original purpose of this investigation 
was to ascertain whether age of fathers af- 
fected father-son resemblance as_ regards 
their interests. In three earlier investigations 
the fathers were 20 to 40 years older than 
their sons. In this investigation fathers and 
sons were both college students at Stanford 
University when tested; in almost every case 
the fathers were tested before the birth of 
their sons. 

Growing out of the above research have 
come three other questions: First, how much 
resemblance is there between fathers and 
sons? Second, how may spurious resemblance 
be eliminated? Third, how does resemblance 
in terms of interests compare with resem- 
blance in terms of other factors? As a by- 
product some data are given concerning the 
resemblance between students’ occupational 
plans and their subsequent occupations, which 
afford a measure of the validity of occupa- 
tional choice. 


Five Investigations 


In 1931 Forster (3) reported correlations 
between occupational interest scores of 125 
fathers and sons, also between interest pro- 
files. The sons were students at the Univer- 
sity of Minnesota; nearly all were freshmen. 

The second investigation was completed by 
the writer in 1935 but not reported until 1943 
(8, p. 680). It compared the interest scores 
of 110 pairs of fathers and sons on 22 scales. 
The sons averaged 22 years of age and the 
fathers 58 years of age. Part of this large 
difference of 36 years in age was caused by 
the fact that 74 of the sons filled out the 
blank in 1927 and their fathers in 1935, thus 
increasing the difference in age between them 
by eight years. The investigations of Forster 
and Strong, 1935-43, utilized the original 
scales. The responses of 100 pairs of fathers 
and sons have been rescored on 26 revised 


1 Read in part before the Western Psychological 
Association, March 29, 1956 


284 


scales. For convenience these data are desig- 
nated as the third investigation and referred 
to as “Strong 1935.” In the few instances 
where reference is made to data based on the 
original scales, the data will be labeled 
“Strong 1935-43.” 

The fourth investigation was that of Gjerde 
whose Ph.D. thesis was accepted at the Uni- 
versity of Minnesota in 1949 (4). His sub- 
jects were 184 9th and 10th grade students 
of both sexes in the Laboratory School of the 
University of Chicago, but only sons are con- 
sidered here. Their mean IQ was 136. The 
students averaged 14.6 years of age, their 
fathers 48.9 years. 

The present and fifth investigation referred 
to as “Strong 1956” involved former students 
at Stanford University between 1927 and 1934 
who had sons at least 17 years of age. Of 985 
former students whose Vocational Interest 
Blanks were in our files, 195 were dead or 
their addresses unknown. Letters were mailed 
to the remaining 790 men explaining the proj- 
ect and asking that they reply on an enclosed 
postal card as to how many sons they had 17 
years of age and older. No replies were re- 
ceived from 177 men, 199 reported sons of 
the requisite age, and 472 reported no sons of 
such age. Blanks were returned by 182 sons. 
The data reported below are based on 131 
fathers with one son, 20 fathers with two 
sons, 1 father with three sons, and 2 fathers 
with four sons—a total of 154 different fathers 
and 182 sons. The fathers ranged from 17 to 
41 years of age when originally tested, aver- 
aging 23.8 years, sigma of 3.9. Sons ranged 
from 17 to 35 years, averaging 21.3 years, 
sigma of 3.7 years. Sixty-two per cent of 
fathers and sons differed in age when tested 
by 0 to 3 years; only 19% differed by more 
than 5 years. 

Father-son resemblance has been measured 
in four different ways, namely, in terms of 
(a) interest profiles, (b) scores on occupa- 





Interests of Fathers and Sons 


tional scales, (c) items, and (d) occupational 
plans and subsequent occupations. 


Resemblance Based on Interest Profiles 


Correlation between interest profiles indi- 
cates the extent to which fathers and sons 
have the same rank-order of occupational 
scores. 

The distribution of Pearsonian coefficients 
between interest profiles of 36 scores for each 
of the 182 pairs in the fifth investigation is 
given in Table 1. The correlations range 
from — .67 to .92 and average .41. Averages 
and standard deviations of the resemblance 
coefficients were calculated in terms of Z 
scores and expressed finally in terms of cor- 
relations. The resemblance coefficients of 
100 chance pairs of fathers and sons are also 
given in the table. They average .24. The 
difference between .41 and .24 is significant 
at the .01 level. It is possible to calculate 
the significance of the difference since both 
correlations are averages of many coefficients 
and the significance of the difference can be 
calculated in terms of the standard error of 
the difference. 

Forster obtained similar data for his 125 
fathers and sons. We have calculated the av- 
erages of his correlations to be .63 for true 
pairs and .42 for chance pairs, the difference 
of .21 being significant at the .001 level. 

If the correlation coefficients of .38 and 
— .38 are used as cutting-points in the dis- 
tribution of coefficients in Table 1, it can be 
said that half of the 185 sons have interests 
rather similar to their fathers, that 5% have 
interests rather dissimilar to their fathers, and 
41% have interests approximating a chance 
relationship to their fathers. But if correla- 
tions between chance pairs are taken into con- 
sideration, then about one-fourth of sons have 
interests somewhat similar to their fathers 
and the remainder resemble their fathers no 
more than chance would give. 


Spurious Resemblance 


If a father and son are both physicians, 
one is apt to point to that fact as evidence of 
father-son resemblance. The son may have 
become a physician because he inherits traits 
common to his father, or because of common 


Table 1 


Distributions of Pearsonian Correlations Between 
Interest Profiles of Strong 1956 
Fathers and Sons 


Fathers and Sons 


Chance 
(Total 100) 


True 
(Total 182) 


.66 

54 

38 

20 

00 

-.20 

38 

-.54 

06 

76 

Mean z : 25 

Sigma z : 51 

Mean r Al 24 
Net df. (see text) = .33 


family environmental influences, or because 
of factors unrelated to his father as, for ex 
ample, association with fellow students in- 
terested in medicine. All three types of in- 
fluences are presumably present in varying 
degrees in each case. Psychologists have not 
yet disentangled hereditary and environmental 
influences. But we should make some allow- 
ance for environmental factors common to a 
given sample of fathers and sons. No better 
method occurs to us than to determine the 
resemblance between fathers and sons se- 
lected by chance. Deduction of the resem- 
blance between chance pairs from the resem- 
blance between true pairs should give a better 
estimate of father-son resemblance 
Correlations between interest profiles are 
influenced by the fact that fathers and sons 
belong to the same socioeconomic environ- 
ment. Few of them have, for example, the 
interests of carpenters, one of the scores in- 
cluded in the profile. This fact will raise the 
correlation coefficients between fathers and 
sons. Correlations between profiles are also 





286 Edward K. 
influenced by the fact that few men have 
high scores on the psychologist and mathe- 
matician scales and many men have fairly 
high scores on the realtor scale. Here again 
the effect of such scores is to raise the cor- 
relations. 

If interests are purely the resultant of en- 
vironment, then presumably the correlations 
based on chance pairs, given in Table 1, can 
be considered to measure resemblance com- 
mon to men as such, rather than resemblance 
peculiar to father-son pairs. On the other 
hand, if the sons inherit the same tendencies 
as have their fathers not to be carpenters, for 
example, then the “chance” coefficient is too 
high as a measure of common environment. 

How shall the “chance” resemblance co- 
efficient of .24 be deducted from the true-pair 
coefficient of .41 to give a net resemblance 
coefficient? Various procedures have been 
recommended, all but one giving the same 
net difference.’ The simplest procedure is to 
subtract the spurious variance (chance pairs) 
from the obtained variance (true pairs). The 
formula would be 


\ True pairs r° — chance pairs r’. 

The difference between .41 and .24 would 
then be .33. When the data from Forster are 
similarly treated we have correlations, respec- 
tively, of .63 and .42 and the difference of .47. 
We shall refer to such a difference as the net 
difference. 


Resemblance Based on Scores of 
Occupational Scales 


A summary of the father-son resemblance 
coefficients concerning occupational interest 
scores from the five investigations is given in 


Table 2. Original scales were used in the first 
two investigations and revised scales in the 
other investigations. The data are restricted, 
however, to the 20 scales common to all the 
investigations. Summaries based on all scales 
agree within .02 from those given in the table. 
The detailed data are given in Tables 2a 
and 2b." 


“The writer is indebted to K. E. Clark, H. S. 
Conrad, Q. McNemar, P. E. Meehl and A. C. Tucker 
for their analyses of this problem. 

8A 2-page table giving all correlations from the 
five investigations has been deposited with the Ameri- 


Strong, Jr. 


Because the data of Gjerde concerning 14- 
year-old boys differ so greatly from those of 
the other four investigations they are disre- 
garded except in the section concerning the 
effect of age of sons upon father-son resem- 
blance. The average of the 102 remaining 
coefficients is .30, sigma of .09. Only seven 
coefficients are less than .15, and none are 
negative. As a check on the Strong 1935-43 
data, the scores of chance pairs of fathers 
and sons were correlated. The average of 
such correlations was — .03. 

The resemblance coefficients from various 
scales in Table 2 range from .00 to .49. It 
cannot be maintained that the same pairs of 
fathers and sons vary in their resemblance on 
the same test as much as the variations in 
their correlations indicate. Forster and Roff 
(7) have considered several explanations for 
the wide variations. The data provide ade- 
quate support, however, for only one of the 
explanations, possibly because the samples are 
too small for the purpose. But the data do 
indicate that reliability of the scales explains 
in part the variations among coefficients. The 
correlation between 134 resemblance coeffi- 
cients and odd-even reliability is .44 and the 
correlation between 50 resemblance coefficients 
and test-retest over 18 years is .61. Disre- 
garding the coefficients of Gjerde, the average 
resemblance between father and sons is .34 
for the more reliable scales and .27 for the 
less reliable scales; or .35 for the scales with 
highest test-retest reliabilities and .26 with 
the lowest test-retest reliabilities. One can 
also estimate the effect of reliability of scales 
upon resemblance coefficients by correcting 
each coefficient for attenuation. If this is 
done, the average of .30 is raised to .35. It 
seems appropriate to use the average of .35 
rather than that of .30 to express the resem- 
blance of fathers and sons when measured by 
scores on occupational scales. 
can Documentation Institute. Order Document No. 
§282 from ADI Auxiliary Publications Project, Pho- 
toduplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 





Inicrests of Fathers and Sons 


Table 2 


Father-Son Resemblance Based on Occupational Scores 
(Pearsonian Correlations) 


Original Scales 


Strong 
Forster 1935 43 
N = 125) (N 


Range of Coefficients OO 49 11-.48 
Mean z 34 30 
Sigma z 12 10 


Mean r 33 29 


Resemblance Based on Items 


Offhand it would seem that the best meas- 
ure of interest resemblances would be in terms 
of the items comprising the Vocational Inter- 
est Blank, but unfortunately there is no really 
good way to measure responses to items. 
Whether the responses of many fathers are 
compared with the responses of their sons to 
a single item, or the responses of a single pair 
to many items, there results a ninefold table, 
which must be reduced to a fourfold table. 
Tetrachoric correlations are impossible or 
questionable in half the cases because one of 
the four cells contains five or fewer cases. 

Interest scales represent a pattern of inter- 
ests. About an equal number of items are 
weighted plus, minus, or not at all. In at- 
tempting to measure resemblance in terms of 
items, this pattern is ignored—all items are 
given equal weight or all pairs are given 
equal weight. In the latter case a father and 
son who are both physicians ought to agree, 
but if one of the two is a salesman they ought 
not to agree. Theory is useful, but in the last 
analysis it is experimentation that settles the 
issue. With this in mind we report the extent 
to which fathers and sons resemble each other 
in their responses to items. 

Resemblance per item. ‘The like, indiffer- 
ence, and dislike responses of 100 fathers were 
plotted against the responses of their sons on 
each of the 400 items separately. The nine- 
fold tables were reduced to fourfold tables by 
contrasting like-like responses with the re- 
maining responses, and similarly dislike-dis- 
like responses with the remaining responses. 
Tetrachoric correlations were calculated for 


= 110) 


Revised Scales 


Strong 
1935-56 
(N = 100) 


Strong 
1956 
= 100) 


Gierde 
(N = 94) (N 


13--.45 
34 


03-46 


13 


both distributions and the coefficients 
averaged. The resulting 400 averages were 
divided into two groups—the first consisted 
of 185 items in which there was no cell con- 
taining 5 or less cases, the second included 
the remaining 215 items (see Table 3). The 
average correlation of 400 items is .16. The 
truer estimate is .21 based on 185 items. 
Both coefficients differ significantly from zero 
at the .001 level. 

We have little confidence in all these tetra- 
choric correlations for the 400 items consid- 
ered separately. The two coefficients for each 
item correlate .54, providing a reliability of 
only .70 for the averages. 

The responses of 100 fathers were also 
plotted against chance sons on 100 items 
(Items 1-70 and 281-310). Averages of 100 
correlations give .19 for true pairs and .02 
for chance pairs; the two measures are sig- 
nificantly different at the .0O1 level. 

Instead of averaging 100 coefficients, the 
100 distributions may be averaged and a sum- 
mary correlation calculated. This procedure 
gives .31 for true pairs and .16 for chance 
pairs. The net difference is .27 (see section 
on Spurious Resemblance for its derivation). 

Another measure of resemblance is percent- 
age of identical responses. This is the sum of 
L-L, I-I, and D-D responses of the 100 fathers 
and sons. If fathers and sons agreed per- 
fectly, there would be 100% identical re- 
sponses and if they disagreed perfectly there 
could - 100°% identical responses, but 
most persons agree fairly well in their inter- 


two 


be 


ests so that negative measures are unlikely 
By pure chance we would have 334% 


identi- 








Edward K. Strong, Jr. 


Table 3 


Father-Son Resemblance for 400 Items 


Tetrachoric 
Correlations 


Significant 


z Scores Correlations 


@~ 1 0 
40 9 
20 80 58 
0 73 109 
20 11 32 
40 1 7 


Total s 215 


Mean z 7 13 
Sigma z 
Mean r a 13 


cal responses not zero. But because there are 
items like “floorwalker,” to which only 2% of 
men in general respond by liking, and “ex- 
plorer” to which 50% to 76% of various 
groups respond by liking, the “true chance 
for men in general” would differ somewhat 
from 33.3% depending upon which items 
were involved. An approximation of such a 
statistic is supplied by determining the per- 
centage of identical responses of chance pairs 
of fathers and sons. 

The obtained percentage of identical re- 
sponses is 43.3 for the 100 items with true 
fathers and sons, and 38.3 for chance pairs. 
The difference, although small, is significant 
at the .001 level. 

Evidently when the responses of many 
fathers and sons are summarized regarding a 
single item, there appears on the average very 
little resemblance between them. Taking 
chance into account, it appears from the data 
in Table 3 that fathers and sons respond alike 
(correlation of .20 to .60) on 38% of items 
and respond on a chance basis to the re- 
mainder. 

Resemblance per pair of fathers and sons. 
In the preceding section the responses of 
many pairs of fathers and sons to single items 
were correlated. In this section the correla- 
tion is between the responses of single pairs 


100 Fathers-Sons on 400 Items 


Non-Significant 
Correlations 


100 Pairs on 
100 Items 
True Chance 
Pairs Pairs 


Total 


1 
28 
138 
182 
43 
8 


400 
17 


of fathers and sons to many items. Responses 
of 50 fathers to 100 items (Items 1-70 and 
281-310) were correlated with the responses 
of their sons. Employing the same procedure 
previously described, there resulted two cor- 
relations for each pair, which were averaged 
to typify the pair. Six of the 100 correlations 
involved 5 or less cases in one cell. The 
mean of the 50 tetrachoric coefficients is .31, 
range — .30 to .69; only four coefficients are 
below zero. 

For chance pairs of fathers and sons the 
mean coefficient is .16, range — .29 to .59. 
Both mean coefficients of .31 and .16 differ 
significantly from zero (.001 level), but the 
two differ from each other at only the .01 
level. The net correlation difference between 
true and chance pairs is .27. 

The percentage of identical responses of 50 
true pairs is 42.2 and of chance pairs it is 
38.0. The difference of 4.2 is significant at 
the .01 level. 

Whether tetrachoric correlation or percent- 
age of identical responses are used as meas- 
ures of resemblance, the conclusion is that 
fathers and sons respond to items very much 
as all men do but that there is somewhat 
greater agreement in their responses (correla- 
tion of .27) than would be expected if they 
were merely two groups of unrelated men. 





‘Interests of Fathers and Sons 


Resemblance in Terms of Occupations 


The fourth method of measuring father-son 
resemblance involves comparison of their oc- 
cupational careers, since occupational choice 
is to some extent an expression of interest. 
Only the 1935 data supply information about 
the occupations in which both fathers and 
sons were engaged. The information con- 
cerning the sons was obtained in 1949, about 
twenty years after they had been tested. Un- 
fortunately, this information is missing for 
either father or son in 17 pairs so that the 
calculations are based on only 83 pairs. With 
such few cases it is impossible to compare 
fathers and sons in terms of specific occupa- 
tions. What has been done is to contrast 
those engaged in physical science occupations 
with those not so engaged, and so on for four 
other broad classifications, such as biological 
science occupations; law, author and advertis- 
ing; business; and selling; and to determine 
the tetrachoric correlations for each such com- 
parison. The correlation based on physical 
.03. The average of 
five such coefficients is .47 (median of .45). 
(See Table 4.) The corresponding average 
coefficient for chance pairs is .16, with a net 
difference of .44. 

Since several distributions have very few 
cases in one cell, the five distributions have 
been combined, giving the summary coefficient 
of .37. The corresponding coefficient for 
chance is .21, yielding a net resemblance of 
30. The reader can take his choice whether 
he accepts .44 or .30 as the better measure of 
resemblance. 


science occupations is 


Effect of Age of Fathers Upon Resemblance 

The original objective of the present in- 
vestigation was to determine the effect of age 
of fathers upon father-son resemblance by 
contrasting the resemblance of the 1935 
fathers and sons with that of the 1956 pairs. 
The 1935 fathers averaged 36 older 
than their sons at time of testing whereas the 


years 


1956 fathers averaged only 2 years older than 
their sons. Unfortunately the two contrasting 
pairs differed in two other ways. The 1935 
fathers and sons were tested between 1927 
and 1935, but an interval of 26 years inter- 


lable 4 


Occupations of Strong 1935 Fathers vs. Occupations 
of Sons in 1949 


Strong 1935 


True Chance 
Pairs Pairs 


N &3 83 
Average of 5 r’s 47 16 
Median of 5 r’s 45 16 
r of five distributions 

combined ; 21 


vened between the testing of the fathers in 
1927—30 and of the sons in 1956. Social, 
economic and political conditions had shifted 
somewhat between 1927-30 and 1956. How 
much interests are affected by such conditions 
is unknown. We suspect the effect is not as 
great as is generally believed, particularly 
among men over 21 years of age. Kelly (6) 
reports few significant changes in 18 years be- 
tween husbands and wives qn many person- 
ality measurements, including interest scores, 
and such changes were relatively small. We 
suspect that the most important difference 
between the 1935 and 1956 pairs is that the 
sons in both investigations and the 1956 
fathers were all Stanford students, whereas 
the 1935 fathers were for the most part not 
former Stanford students nor students at any 
college; a small minority were employed in 
low status occupations. On this basis we 
should expect resemblance in occupa- 
tional activities between the 1935 fathers and 
sons than between the 1956 fathers and sons, 
which is demonstrated by our data. 


less 


In terms of correlations of scores on occu- 
pational scales the average coefficient was .33 
for the 1935 pairs and .29 for the 1956 pairs 
(see Table 2). The difference of .04 is not 
statistically significant. In terms of mean 
scores, the 1935 fathers differed more from 
their sons than the 1956 fathers from their 
sons. The 1956 fathers differed significantly 
at the .0O1 
three among 29 scales, scoring higher on the 


level from their sons on only 


lawyer and lower on the physician and car- 
penter scales. There are four additional dif- 





290 Edward K. 
ferences at the .05 level, which may be ig- 
nored as they amount to 3 or less standard 
scores and are of little or no practical signifi- 
cance. In contrast, 1935 fathers differed sig- 
nificantly at the .001 level from their sons on 
7 among 26 scales, scoring higher on the 
banker and lower on the personnel, advertis- 
ing, chemist, office, psychologist, and CPA 
scales. They differ also at the .01 level on 
three scales and at the .05 level on two scales. 
These differences in mean scores indicate that 
the older fathers differ from their sons more 
than the fathers who were tested while in col- 
lege differ from their sons. 

Father-son resemblance in terms of occupa- 
tional activity is greater among 1956 pairs 
than 1935 pairs. In the case of the 1956 
pairs, the sons averaged 21 years of age 
and consequently only the occupations they 
planned to enter are available to contrast with 
the occupations engaged in by their fathers. 
Among the 100 pairs of fathers and sons used 
in preceding comparisons, 27 sons did not re- 
cord their occupational plans. The sons’ oc- 
cupational plans are compared with their 
fathers’ occupations in Table 5. Here as in 
Table 4, comparisons are made in terms of 
five broad groupings of occupational activities. 
Whether the five coefficients are averaged, or 
the five distributions are averaged to give a 
summary coefficient, there results greater re- 
semblance among the 1956 fathers and sons 
than among the 1935 pairs. 

These measures indicate greater resem- 
blance between fathers and sons tested at the 
same age than between fathers and sons 
where the fathers were 38 years older than 
the sons when tested. But none of the dif- 
ferences between the two groups is statisti- 








Strong, Jr. 


cally significant except the differences in 
mean scores, which can best be explained on 
the basis that the two pairs of sons and the 
1956 fathers were samples of college men and 
the 1935 fathers were not so selected. 

Age of fathers when tested is apparently, 
then, of little or no significance when it comes 
to comparing fathers and their sons when 
sons are of college age. But Gijerde’s data 
indicate that when sons are 14.6 years of 
age, they resemble their fathers to a lesser 
degree than is true when the sons are seven 
years older. 

The relationships reported here are about 
what would be obtained if resemblance in 
terms of height was under consideration. It 
would make very little difference whether 
fathers were 21 or 58 years of age in com- 
paring them with their 21-year-old sons, but 
it would make a difference if fathers of any 
age were compared with sons who were only 
14.6 years of age. The relationships are simi- 
lar with respect to height and interests be- 
cause both are surprisingly stable from 21 
years onward, although no claim is made that 
interests are as stable as height. 

Our guess is that the slight differences be- 
tween fathers and sons in the 1935 and 1956 
investigations are not due to differences in 
age, but to the fact that the 1935 fathers 
differ more in fields of interest and levels of 
interest, to use Super’s terminology (1; 11, 
p. 132), than do the 1956 fathers and both 
samples of sons. 

Do sons follow their fathers’ occupations? 
Jensen and Kirchner (5), in reviewing the 
literature on the subject, refer to reports that 
range from “No” to “Yes” as answers to the 
question. On the basis of a widespread sur- 


Table 5 


Occupations of Fathers vs. Occupational Plans of Sons While in College 


Strong 1956 
Chance 
Pairs 


True 
Pairs 
N 
Average of 5 1's 
Median of 5 r’s 
r of 5 distributions combined 


73 
A8 
38 
44 


Strong 1935 


Chance 
Pairs 


True 
Pairs 
99 
13 
18 
15 





Interests of Fathers and Sons 


vey, th.y conclude that “sons do tend to fol- 
low their fathers’ general type of occupation.” 
The data in Table 4 support this conclusion 
to the extent of a net correlation of .30 to .44 
where chance has been taken into account. 


Effect of Age of Sons Upon Resemblance 


Gjerde reports significantly lower resem- 
blance coefficients than do the other four in- 
vestigators (see Table 2). Gjerde’s sons av- 
eraged 14.6 years of age whereas the sons in 
the other investigations were college students. 
His sons were a superior group as far as boys 
of that age are concerned since their average 
IQ was 136. How they compared in this re- 
spect with the other sons is unknown. 

Was the youth of Gjerde’s boys responsible 
for the lower resemblance coefficients? Roff 
(7) has suggested that the Gjerde boys were 
a more restricted sample and that the lower 
resemblance coefficients could be explained in 
part on that basis. Actually the Gjerde sons 
have an average standard score of 29.4 on 19 
scales * which is somewhat less than the 30.8 
to 32.4 of the other five samples. And the 
average standard sigmas of Gijerde’s sons of 
8.8 and fathers of 10.6 are both lower than 
the sigmas of the other four samples, i.e., 11.3 
to 11.6. Rough calculations indicate that the 
rather slight restriction in scores of the Gjerde 
data should reduce the resemblance coeffi- 
cients somewhat, but it does not seem likely 
that it could cause as great a reduction as 
actually occurs. 

Gjerde reports means scores for fathers and 
sons on 39 occupational scales, in which the 
fathers average 1.3 standard score lower than 
the sons. But this average is not typical of 
all scales, since there are 14 scales in which 
fathers and sons differ by 6 or more in mean 
scores. Sons average 9.5 higher on dentist, 
musician, and the seven scales in Group IV, 
and they average 7.0 lower on CPA, presi- 
dent, and three scales in Group V. In gen- 
eral, these differences characterize those be- 
tween heterogeneous groups of junior and 
senior high school students and college sen- 


4 There are 20 scales common to the six samples, 
but the CPA scale has been omitted because the 74.46 
reported by Gjerde as the raw sigma of the boys on 
that scale is obviously a typographical error. 


291 


iors, differences that are attributed to both 
age and selection, since many high school stu- 
dents do not become college seniors. There 
are, however, too many unknown factors to 
warrant a guess as to how much Gjerde’s sons 
differ from their fathers in mean scores be- 
cause of age and/or selection. It is apparent 
that such differences would lower resemblance 
coefficients. We are still left with the ques- 
tion: Do the interests of sons become more 
similar to the interests of their fathers as 


they increase in age from 14.5 years upwards? 


Resemblance in Terms of Interests 
vs. Other Factors 

We concluded in 1943 that permanence of 
interest scores is somewhat less than for in- 
telligence test scores but more permanent over 
the college period than college grades, and 
distinctly higher than for attitude test scores 
(8). This conclusion was reaffirmed in 1951 
(9). The same relationships hold with re- 
spect to father-son resemblance. Resemblance 
in terms of interests is .35, using the more re- 
liable scales, which is somewhat less than the 
coefficient of .50 on intelligence tests but ap- 
preciably higher than the correlations con- 
cerning personality tests. 

Correlations between identical and fraternal 
twins were reported by Carter in 1932 (2). 
The former averaged .50, the latter .28 on 23 
occupational scales. On this basis, the re- 
semblance between father and sons is equal 
to or possibly higher than that between fra- 
ternal twins, but distinctly not so great as the 
resemblance between identical twins. 


Validity of Occupational Choice 

Since the data were available, we have cal- 
culated the correlation between the occupa- 
tional plans of the 1935 sons, the majority of 
whom were in college during 1927-30, with 
the occupations in which they were engaged 
in 1949. The net coefficient is .84 (see 
Table 6). This coefficient has been derived 
in the same manner as those in Tables 4 and 
5. Reference to these two tables disclosed the 
fact that fathers’ occupations agree equally 
well with sons’ occupations and with sons’ oc- 


cupational plans while in college. This is not 








Edward K. 


Table 6 
Occupational] Plans of Strong 1935 Sons While in College 
vs. Occupations of These Sons in 1949 
Strong 1935 


Chance 
Pairs 


True 
Pairs 


N 83 
Average of 5 r’s wt) 
Median of 5 r’s 89 
r of § distributions 


83 
0 
30 


combined 87 24 B4 


surprising since the correlation between sons’ 
occupational plans and future occupation is 
54. 

This coefficient may be viewed as the va- 
lidity of occupational choice of college stu- 
dents, mostly seniors, in terms of broad classi- 
fications of occupations engaged in about 22 
years later on. Previously (10) we estimated 
the validity as .69. The two coefficients of 
84 and .69 may be equally meaningful if one 
takes into account that they were calculated 
in two entirely different ways, that the for- 
mer concerned agreement in terms of five 
broad classifications of occupations and the 
latter in terms of specific occupations, and 
that the former was based primarily on col- 
lege seniors and the latter upon freshmen, 
from whom we should not expect such good 
agreement. 


Summary 


Estimation of “true” resemblance must rep- 
resent resemblance between fathers and sons, 


as such, and not resemblance between all 
men, or even the general resemblance be- 
tween men in any two samples under consid- 
eration. Correlations between chance pairs 
have been deducted from correlations be- 
tween true pairs to provide a more accurate 
measure of resemblance. 

Father-son resemblance must be construed 
to mean resemblance in certain respects but 


Strong, Jr. 


not in all respects. Just as father and sons 
may have similar height but not the same 
color of eyes, so they may agree in certain in- 
terests but not in others. Resemblance based 
on all interests may be slight, but resemblance 
based on some grouping of interests may be 
significant. 

An interest profile should express a pattern 
of interests even better than a single scale. 
On such a basis one may explain the higher 
resemblance between profiles (.40) than be- 
tween scores on interest scales (.30 to .35) 
and the definitely higher correlation between 
profiles than between items (.27). 


Received October 15, 1956. 


References 


1. Barnett, G. J., Handelsman, I., Stewart, L. H., 
& Super, D. E. The occupational level scale 
as a measure of drive. Psychol. Monogr., 

. 1952, 66, No. 10 (Whole No. 342). 

. Carter, H. D. Twin similarities in occupational 
interests. J. educ. Psychol., 1932, 23, 641-655. 

. Forster, M. C. A study of father-son resem- 
blances in vocational interest patterns. Mas- 
ter’s thesis, Univer. of Minnesota, 1931. 

Gjerde, C. M. Parent-child resemblance in vo- 
cational interests and personality traits. Ph.D. 
thesis, Univer. of Minnesota, 1949. 

Jensen, P. G., & Kirchner, W. K. A _ national 
answer to the question, “Do sons follow their 
fathers’ occupations?” J. appl. Psychol., 1955, 
39, 419-421. 

Kelly, E. L 
ality. 


Consistency of the adult person- 
Amer. Psychologist, 1955, 10, 659-681. 
Roff, M.  Intra-family 
ality characteristics 
199-227. 

. Strong, E. K., Jr. Vocational interests of men 
and women. Stanford, Calif.: Stanford Uni- 
ver. Press, 1943. 

. Strong, E. K., Jr. 
over 22 
89-91 

. Strong, E 
choice. 
110-121. 

. Strong, E. K., Jr. Vocational interests 18 years 
after college. Minneapolis: Univer. of Minne- 
sota Press, 1955 


resemblances in 
J. Psychol. 


person- 


1950, 30, 


Permanence of interest scores 
years. J. appl. Psychol., 1951, 35, 
K., 

Educ. 


Jr. Validity 
psychol. 


of occupational 
Measmt, 1953, 13, 





Journal of Applied P holog 
Vol. 41, No. 5, 195 


Reported Driving Speeds and Previous Accidents ' 


Roger G. Stewart 


Institute of Transportation and Trafic Engineering, 
University of California, Los Angeles 


Considerable emphasis is being directed to- 
day toward fast driving as a major cause of 
accidents. In the writer’s opinion, two fac- 
tors have been primarily responsible for this 
emphasis: a knowledge of certain statistics 
which purport to show that speeding viola- 
tions are frequently involved in automobile 
accidents, and a desire by safety officials and 
other interested individuals to find a satisfac- 
tory and feasible solution for the nationwide 
problem of accidents. Statistics often refer to 
a tabulation of “causes” of accidents shown 
in official accident reports. To illustrate, sta- 
tistics gathered from various sources annually 
by the National Safety Council have sug- 
gested that speeding violations and/or driv- 
ing too fast for conditions may be associated 
with 25 per cent or more of all serious acci- 
dents (1). The data in accident reports and 
summaries are subject to some unreliability 
and uncertainty; yet these statistics provide 
some basis for the current attention being 
given to speeding as a major factor in auto- 
mobile accidents. 

One shortcoming of the data in accident re- 
ports will be indirectly examined in this pa- 
per: that no information is given about speed- 
ing violations or fast driving speeds without 
the occurrence of accidents. How often does 
high-speed driving fail to lead to an accident? 
Is fast driving in itself, on a percentage basis, 
as likely to involve the driver in an accident 
as other traffic violations such as turning from 
improper traffic lanes or slowing down sud- 
denly in traffic without sufficient warning? 
If one concedes that fast driving does not al 
ways lead to an accident, then the problem 


1 The author is very grateful for the assistance of 
the following individuals during this study: Dr 
Harry W. Case, Department of Engineering, Univer- 
sity of California, Los Angeles; Mr. Paul Mason and 
Mr. Fred P. Williams, Department of Motor Vehicles, 
State of California, Sacramento; Dr. T. H. Southard 
and Mr. F. H. Hollander, Numerical Analysis Re 
search, Department of Mathematics, University of 
California, Los Angeles 


293 


may be conceived as a comparison of fast vs. 
slow driving in relation to accidents. In this 
paper, findings from official 
traffic records are presented for individuals in 
the younger age groups in relation to their re- 
ported driving speeds. 


some research 


Method 
Subjects 


The students in psychology 
classes at the University of California, Los Angeles, 
during 1955 and 1956 
ing in California for two years or 
jects were fairly evenly 


subjects were 275 
Most subjects had been driv 
The sub 
divided regarding sex, and 
the median age was about 21 years 


more 


Driving Inventory 


All subjects were administered a driving inventory 


This inventory contained items of various types on 
one’s personal driving habits and experiences. Four 
items of the inventory referred to driving speed on 
the highway in specified These four 
items on driving speed were used for classifying the 
275 subjects in terms of speed 


following 


conditions 


One item was the 


What is your usual highway 


(MPH) during daylight? 


driving speed 


20 0 W) 53 wo 70 8O 9D 


Fach 


sponse on a 


indicate his re 
used in 


subject was instructed to 


continuum similar to those 


numerical rating scales 


Procedure 


Determining Speed Groups. The were 


classified in terms of driving speed by use of a sim 


subjects 


ple summation of the numerical values on the four 


speed items The 


four values per subject were 
summed to yield a total for e*c’ subject, and the 
275 individual totals were r i order. Then, 
cutting points were found to 1 the upper and 


lower one-third of the 275 cases. The 89 subjects in 
called “FAST” drivers, and 
the 89 individuals in the lower group were named 
“SLOW” drivers, leaving 97 people who were referred 
to as “MEDIUM” speed drivers 

Determining Violators. After the testing, the offi 
citations and accidents for al 


the upper group were 


cial records of traffic 








294 Roger G. 
subjects while driving in California were obtained 
from the Department of Motor Vehicles in Sacra- 
mento, California. Records for nine individuals that 
could not be clearly identified were not included in 
the sample of subjects reported here. Using the offi- 
cial records, the 275 subjects were classified as viola- 
tors or nonviolators concerning speeding offenses as 
well as all moving or traffic offenses. Fifty-seven of 
the 275 subjects had received one or more citations 
for speeding in California, and a total of 103 indi- 
viduals (including the 57 violators of speeding of- 
fenses) had been given one or more citations for a 
traffic offense of some kind. 


Results 
Individuals With Accidents 


Table 1 shows the number and percentage 
of individuals in each of the three speed 
groups with prior recorded accidents in Cali- 
fornia. 

Table 1 shows that approximately equal 
percentages of subjects in the three speed 
groups had been involved in accidents dur- 
ing the period considered. Assuming random 
sampling from a common population, the chi 
square for the frequencies shown in Table 1 
equals only .75. This indicates, therefore, 
that involvement in accidents for these sub- 
jects is independent of their speed classifica- 
tion based on the inventory responses. This 
fact suggests that attention directed toward 
individuals for speeding or fast driving will 
not substantially reduce the total number of 
accidents for this type of driving population. 


Individuals With Speeding Citations 


The importance of the above finding de- 
pends partly on whether the method of classi- 
fying the subjects in terms of speed has con- 


Table 1 


Relationship of Individuals With and Without Prior 
Recorded Accidents to Speed Groups 


Per 
centage 
One or 


More 


One or 
More 
Accidents 


No 
Accidents 


Speed 
Groups 


SLOW 
MEDIUM 
FAST 


73 
SI 
70 


16 
16 
19 


18 
16 
21 
Total 224 


51 Mean 19 


Stewart 


siderable validity. This, in turn, relates to 
the use of inventory responses for classifica- 
tion purposes. To explore this question, the 
number of subjects with one or more citations 
for speeding offenses was determined for each 
speed group. These results are shown in 
Table 2. 

Table 2 shows that the percentage of sub- 
jects with speeding citations increases pro- 
gressively from the SLOW to the FAST group. 
Of the 57 individuals with citations, 34, or 
more than half of them, had been classified as 
FAST drivers. Chi square for the contingency 
table of frequencies for all three groups is 
25.72, which far exceeds 9.21, the value re- 
quired for significance at the .01 level. This 
means that the subjects were classified with 
respect to prior speeding citations more ac- 
curately than one would obtain by chance. 
This result indicates some validity for our 
findings concerning the grouping of subjects 
into the speed groups and, therefore, concern- 
ing individuals with previous accidents in the 
speed groups. 


Individuals With Traffic Citations 


Further results show that ‘the subjects, as 
classified into the speed groups, tend to have 
records of traffic citations consistent with their 
classification. Table 3 shows the number and 
percentage of subjects in each speed group 
who had received one or more citations for a 
moving traffic offense (including speeding of- 
fenses). 

Table 3 shows a progressive increase from 
the SLOW to the FAST group. Half of the 
103 individuals with traffic citations had been 


Table 2 


Relationship of Individuals With and Without Prior 
Speeding Citations to Speed Groups 


Pe: 
centage 
One or 

More 


One or 

No More 
Speeding Speeding 
Citations Citations 


Speed 
Groups N 


SLOW 
MEDIUM 
FAST 


89 81 8 9 
82 15 15 


55 34 38 
Total 


218 57 Mean 21 





Driving Speeds and 


Table 3 


Relationship of Individuals With and Without Prior 
Traffic Citations to Speed Groups 


One or Per 
No More centage 
Traffic Traffic One or 
Citations Citations More 


Speed 
Groups 


SLOW 72 17 19 
MEDIUM 63 34 35 
FAST 37 52 58 


Total 27: 172 103 Mean 37 


classified in the FAST group. Chi square 
equals 29.76, which far exceeds the tabled 
value at the .01 level. This result is addi- 


tional evidence that the inventory responses 
have some validity for the grouping of the 
275 subjects in terms of speed. 


Relationship Between Speeding Citations, 
Traffic Citations, and Accidents 


The results just presented show that the 
individuals classified as FAST drivers had not 
been involved in accidents more often than 
the MEDIUM speed or SLOW drivers. How- 
ever, it is still possible that the use of some 
other method would support this commonly 
accepted belief. For example, one might main- 
tain that the subjects with previous citations 
for speeding constitute the FAST drivers, and 
that these people would show more involve- 
ment in accidents than the subjects without 
speeding citations. Using this criterion, the 
275 subjects were grouped regarding speeding 
citations (none vs. one or more) and acci- 
dents (none vs. one or more), and chi square 
was computed to test the independence of 
these two variables. This is shown in Table 4. 

Table 4 yields a chi square of only 2.83, 
which falls far short of 3.84, the required 
value for significance at the .05 level. 

Turning to total traffic citations, the same 
outcome is obtained, chi square being 1.56. 
In other words, previous involvement in acci- 
dents for the subjects could not have been 
predicted with any success from knowledge of 
prior speeding citations or total traffic cita- 
tions. 


Accidents 


Table 4 


Relationship of Individuals with Speeding 
Citations and Accidents 


No 
Speeding 
Citations 


Speeding 


Citations Total 


Accidents ; 15 51 
No Accidents 42 


Total ! 57 275 


x? = 2.83 (not significant at .0S level) 


Discussion 


Our results fail to show that a significantly 
higher percentage of the faster drivers, con- 
sidered by the speed classification and by 
speeding citations, had become involved in 
accidents than slower drivers. Stated posi- 
tively, the individuals in the group of FAST 
drivers and those subjects with speeding cita- 
tions appear to be as safe as the other drivers. 
No basis was found for concluding that in- 
volvement in accidents is related to stated 
driving speeds on the open highway or to 
citations for speeding offenses. For these 
subjects, previous involvement in accidents 
seemed to be independent of these two factors. 

Evidence was presented for the validity of 
our grouping of the drivers into the three 
speed groups. Significant differences were 
found between the FAST and the SLOW 
drivers in speeding citations and in total 
traffic citations. This means that the FAST 
drivers were, in part, those whom many peo- 
ple would expect to become involved most 
often in accidents. In each instance, how- 
ever, it was found that the subjects with one 
or more prior citations were not different from 
the subjects without citations in terms of acci- 
dent involvement. This finding does not 
seem to be consistent with the commonly ac- 
cepted views that traffic violators and fast 
drivers in the younger age groups constitute 
the chief hazard to safe driving on our high- 
ways. 

Throughout this discussion, no mention has 
been made of the distinction between being 
involved in an accident and “causing” it, since 








296 


the official records indicated only whether the 
individual was a driver involved in a reported 
accident. While more facts about each acci- 
dent would be desirable for some purposes, 
the judgment of probable cause or “fault” of 
each accident from such information, even by 
psychologists (!), would be highly subjective. 

In conclusion, the results of this study sug- 
gest that individuals in the kind of popula- 
tion considered who report consistently higher 
driving speeds than average have traffic rec- 
ords free of accidents as often as other drivers. 
This finding also holds for individuals with 


Roger G. 


Stewart 


previous speeding citations and traffic cita- 
tions of all kinds. If these results should be 


verified with drivers in the general public, 
they would certainly command some re-ex- 
amination of certain emphases and enforce- 
ment procedures based on the belief that high- 
speed drivers become involved in accidents 
more often than medium-speed or slow drivers. 


Received October 22, 1956 


Reference 


1. National Safety Council 
cago, Illinois: Natl 


Accident facts. Chi- 
Safety Council, 1955. 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


The Measurement of Creativity in Machine Design ' 


W. A. Owens 


lowa State College 


C. F. Schumacher 


University of Minnesota 


and J. B. Clark 


Western Reserve University 


In spite of what is now its clearly apparent 
social importance, psychologists have thus far 
devoted relatively little time to the investiga- 
tion of creativity. That work which has been 
done may be assigned to three broad cate- 
gories for purposes of review. 

First, both necessarily and conventionally, 
a substantial proportion of what has been 
written is primarily descriptive, speculative, 
or intended to stimulate research. Prominent 
among recent publications of this character 
are those of Guilford (3), Thurstone (18), 
and Osborn (12). Less recent, and some- 
what less directly relevant to creativity, are 
such familiar formulations as Dewey's de- 
scription of the steps in the problem-solving 
process, the Gestalt schools’ explanation of in- 
sight learning in terms of the reorganization 
of elements to form new and meaningful 
wholes, and Spearman’s identification of the 
ability to educe relationships with the highest 
form of cognition. 

Second, there is a type of investigation, 
germane to creativity, which has as its focus 


1This research was done under Contract Nonr 
§30(02) between the Office of Naval Research and 
Iowa State College; however, the opinions and as- 
sertions contained herein are those of the writers and 
are not to be construed as official or representing the 
views of the Navy Department or the Naval service 
at large. 

2 The writers especially wish to acknowledge the 
insight of Dean J. F. Downie Smith, of the Engi 
neering Division at Iowa State College, who was 
largely responsible for the initiation of this work, 
and the administrative assistance of Dr. George 
Town, Associate Director of the Engineering Experi 
ment Station. They also wish to recognize the con 
tributions of the following persons who were, at one 
time or another, associated with the project: staff 
members R. B. McHugh and A. N. Colby; graduate 
assistants R. F. Boldt, V. N. Campbell, M. Hirt, B 
Jacobson, H. Martinek, Georgia Maxson, R. I 
Moren, D. G. Nichols, J. Shearer, and A. Truesdell 


increasing the supply of potential talent 
either through appropriate training or through 
the discovery of conditions optimally con- 
ducive to the problem-solving process. For 
example, Taylor (16, 17) is the present leader 
of a broadly conceived program of this type; 
and Calvert et al. (2) have recently been 
concerned with a project specifically devoted 
to the area of engineering problems. 

Third, there is a class of studies in which 
primary emphasis has been placed upon the 
identification of individuals possessing out- 
standing potentialities for creative work. 
Prominent among these is surely the compre 
hensive program of Guilford and associates 
(4, 5, 6, 7, 15) devoted to the development of 
measures, and to the factorial description of 
various patterns, of “high-level aptitudes.” 
It is hypothesized that one of these patterns 
characterizes the creative individual, although 
it is also recognized that there may be several 
types, and thus several patterns, of creativity 

Within the broad field of engineering, Har- 
ris and Simberg (9), of the A. C. Spark Plug 
Division of General Motors, have developed 
what seems to be a very promising Test of 
Creative Ability, although, to the knowledge 
of the writer, the only report on their work 
to date, excepting an abstract by Harris (8), 
is contained in the Examiners Manual for the 
test. Conspicuously present here is some en- 
couraging preliminary evidence as to validity. 

The present study, involving a highly spe 
cific criterion, was initiated because it was 
recognized that the United States was criti 
cally short of creative machine designers dur- 
ing World War II, and that an alarmingly 
small proportion of those present were either 
native born or the products of American 


- 





298 


Schools of Engineering. It has two phases— 
one concerned with the concurrent validities 
of measures applied to various industrial 
groups, and a second directed at discovering 
the predictive value of the same measures al- 
ready applied to over 1,500 students at 24 
colleges and universities. The present report 


concerns the first phase only. 


Purpose 
Accordingly, the formal purpose of this in- 
vestigation was to develop, and to evaluate, 
tests for the discrimination of creative from 
noncreative machine designers. 


Methods 


Limitations of space do not permit a detailed de- 
scription of all aspects of the methods here em- 
ployed. However, since many of them are familiar 
and clearly implied, only those which are central or 
somewhat unusual will be emphasized. Where more 
complete information is desired, it may be obtained 
from the Examiners Manual for the present tests 
(13), or by direct communication with the senior 
author. 


Overview 


In brief, this study involved the construction of 
nine measuring devices which were item-analyzed 
and cross-validated against a primary criterion of 
the rated creativity or noncreativity of 295 engi- 
neers in 31 industrial firms. 


Hypotheses 


The following structuring hypotheses were formu- 
lated subsequent to the interviewing of a number of 
creative engineers and an examination of the litera- 
ture, but prior to the actual task of test construc- 
tion: 


1. It was assumed, because of restriction at a high 
level, that differences between creative and noncrea- 
tive machine designers are not primarily attributable 
to general mental ability. 

2. It was assumed that constructive, high-level 
aptitudes or abilities cannot surely be fairly ap- 
praised with recognition-type tests, and that those 
constructed should be primarily of the completion 
type. 

3. It was assumed that differences between crea- 
tive and noncreative could not be fairly attributed to 
aptitude unless age, education, and relevant job ex- 
perience were controlled. 

4. Since the organization of creativity is incom- 
pletely determined, it was assumed to be “specific,” 
the safest assumption, and measuring devices were 
restricted to areas of some felt machine design rele- 
vance. 


W. A. Owens, C. F. Schumacher, and J. B. Clark 


Measuring Instruments 


The devices themselves may be briefly described 
as follows: 


The Power Source Apparatus Test.2 The S is 
given a power source and a motion sequence and is 
to sketch as many intervening mechanisms as pos- 
sible. The scores are for absolute number and num- 
ber “workable,” with the workable solutions cate- 
gorized and weighted for discrimination (8 items). 

The Design a Machine Test. The S is given a 
particular purpose to be served and is to sketch as 
many appropriate mechanisms as possible. The scores 
are for absolute number and number “workable.” 

The Applications of Mechanisms Test.2 The S is 
given a particular mechanism and is to enumerate 
as many types of machines in which it might func- 
tion as possible. The score is the number listed (7 
items). 

The Three Dimensional Space Relations Test. The 
S is to mentally fold a flat pattern to form a cube 
and then to indicate the positions of three of its 
faces following various patterns of rotation. The 
score is the number correct. 

The Figure Matrices Test. The S is to derive the 
rules by which a figure evolves in two dimensions 
and to apply these in order to supply a missing fig- 
ure at any given point in the matrix. The score is 
the number correct. 

The Number Series Test. The S is to derive the 
rule or principle by which a series of numbers is 
generated and to apply it to supply certain missing 
numbers. The score is the number correct. 

The Unstructured Test. The S is given a series of 
meaningless, incomplete “mechanical-type” drawings 
and is to indicate what they could be, how many 
different things can be seen in each, and to circle 
the relevant parts. The scores are for number of ob- 
jects identified, number of responses per minute, num- 
ber involving motion, number of line segments used 
in each response, number of machines identified, and 
percentage of machine responses 

The Personal Inventory.“ This is a quasi forced- 
choice inventory,* originally of 197 items, which 
deals with interests, attitudes, opinions, personal 
characteristics and experiences. The score is the 
number of weighted responses typical of the crea- 
tive machine designer; item weights are based upon 
amount and consistency of discrimination (50 items). 

The Personal History Form® This is a single 
sheet, originally of 48 items, which deals with per- 
sonal background. The score is the number of 
weighted responses typical of the creative machine 
designer with all responses assigned to the same five- 
point scale and differentially weighted for amount 
and consistency of discrimination (8 items). 

Only tour of the above instruments survived item- 
analysis and cross-validation. These were the Power 
Source Apparatus Test, the Applications of Mecha- 
nisms Test, the Personal Inventory, and the Personal 
History Form. These compose the revised battery. 

8 Retained in the revised battery. 

* Precise preference matching was not attempted. 





Measurement of Creativity in Machine Design 


Subjects 


The several samples of subjects, the batteries ad- 
ministered, and the pattern of cross-validation are 
indicated in Fig. 1. It should be noted that the n’s 
for all groups, except the Patent Index group, include 
both creative and noncreative components. 


Criterion 


Members of the primary criterion groups were de- 
fined as follows: 

Creative designers are persons who have demon- 
strated the ability to comprehend the nature of a 
design problem, and to produce a novel, ingenious, 
or original solution in the form of a total, functional, 
and practical mechanism. Creativity, in this sense, 
does not necessarily involve the conception of an en- 
tirely new principle, but does involve the combina- 
tion of existing principles or mechanisms in such a 
way as to produce a new and unique solution to a 
previously unsolved problem. 

Noncreative designers are persons whose major 
function is to work out the details of a design; they 
are the engineers who do not produce original ideas, 
but who work out the routine problems of what ma- 
terials to use, and who smooth out the design ac- 
cording to established procedures. 

Industries were contacted to see if they had such 
individuals and if they could be tested. Where re- 
plies were positive, their supervisor, usually the chief 
engineer, was instructed to identify engineers who 
could be considered creative according to the defini- 
tion, and to select a like number of noncreatives 
matched as nearly as possible for group age, educa- 
tion, and experience. Subsequent ¢ tests on these 
matching variables revealed no difference significant 
at even the 5% level. 

A secondary sort of criterion was obtained by 
mailing the self-administering measures (the Per- 
sonal Inventory and Personal History Form) to a 
sample of 76 individuals revealed by the U. S. Patent 
Index to be creative, in fact, in the sense of holding 
a large number of patents (minimum 13 and mean 
22 + for the period 1947 through 1951). Their re- 
sponses were contrasted with those of the rated non- 
creatives for item-analysis purposes and it was noted, 
in accordance with expectation, that they were simi- 
lar in direction to those of the rated creatives but 
that discrimination was sharpened by employing the 
Patent Index Group. Insertion of this P. I. group 
actually served a dual purpose—it permitted some 
evaluation of the rating criterion and it made pos- 
sible a modified and preliminary cross-validation of 
those measures potentially standing most in need 
of it. 


Statistical Treatment 


The data were treated by the application of such 
statistical techniques as the ¢ test, chi-square test, 
point-biserial correlation, multiple regression, and dis- 
criminant function analysis. The following methods 
or applications are mentioned because they seem a 
bit out of the ordinary. 


item Analysis Samples 


= 70, 9 Companies N = 66, 7 Companies 


Battery A Battery B 





Power Source Apparatus Design a Machine Test 
Test 
Number Series Test 
. Three Dimensional Space 

Relations Test Unstructured Test 


Figure Matrices Test Personal History Form 





. Personal Inventory 





76, Patent Index Sample* 





» Personal History Form . Personal Inventory 





¢ 











Personal History Form 





eo 
Validation Sample 


N = 159, 15 Companies 





1, Power Source Apparatus Test 
total and workable 


. Application of Mechanisms Test* 
. Personal Inventory 


. Personal History Form 





*Mean patents from '47 thru 51 = 22.6 
*Patterned after one, atypical P.S. A. item, cross. validated 
by splitting validation sample. 


Fic. 1. Samples, test batteries, and pattern of cross 


validation 


Item Selection 


On the Power Source Apparatus Test, Design a 
Machine Test, Unstructured Test, and the Personal 
History Form, differénces in the number of item 
responses obtained from creative and noncreative 
groups were evaluated through application of the H 
test (11), a test of the significance of the difference 
between ranks. The remaining tests and/or scoring 
methods were appraised through the use of con- 
tingency tables with border classifications of crea- 
tive-noncreative vs. some categorization of item re 
sponse (usually correct-incorrect). The 20% level of 
confidence was established as the critical level for 
item selection 

In addition to meeting the foregoing criteria, each 
item was tested for the existence of a so-called “com- 
pany effect.” Such an effect could be said to exist if 
the proportion of correct or diagnostic responses made 
by members of the creative group were not inde- 
pendent of classification by company. The chi- 
square test was employed to evaluate this effect, and 
Snedecor’s (14) computational method was utilized 


Test Selection 


Before accepting a subtest for inclusion in the re 
vised battery, following item analysis, the critical 
ratio described by Brozek and Tiede (1) was ap- 
plied to determine whether or not the number of 
items in the test which were significant above the 
critical (20%) level was greater than could be at- 
tributed to chance. Here, the 5% level was selected 
as the critical one for the inclusion of a subtest 








W. A. 


Owens, C. F. Schumacher, and J. B. Clark 


Table 1 


Odd-Even Test Reliabilities and Test Intercorrelations 
Validation Sample : 


P.S.A. (T) 


Power Source Apparatus— Tota! 
Pooled* 


Av. Est.” 


Power Source Apparatus —Workable 
Pooled 


Av. Est 


Personal Inventory 
Pooled 


Av. Fst 


Applications of Mechanisms 
Pooled 
Av. Fst 


(5% r = 0.16) 


* Pooled ce 


» Av, Eat. 


mbined data for all companies 


Results 

The primary results of this investigation 
are summarized in the three tables which fol- 
low. Table 1 contains odd-even reliability 
estimates and intercorrelations for the revised 
creativity battery as it was administered to 
the validation sample. The tests are virtually 
unspeeded, so the tabled estimates of ry 
should not be inflated. However, since the 
battery is to be used as a predictive unit, it 
seems more pertinent to observe that the sub- 
test intercorrelations are relatively low. An 
exception is the high correlation between to- 
tal number of solutions and number workable 
on the Power Source Apparatus Test. This 
may well raise a question as to whether both 
scores are necessary, the brief answer to 
which seems to be that even the former makes 
a marginally significant (50% level) independ- 
ent contribution to a multiple biserial corre- 
lation with the criterion. 

There appear in Table 2 the point-biserial 
correlations between each of the several pre- 
dictors and the criterion, utilizing both the 
pooled and the average estimate data. In ad- 
dition, the summarizing Multiple R is given 
at the bottom of each of, the two columns. 
It should be noted that these R’s have an ele- 
ment of spuriousness, since the estimated cor- 


A.M.T. 


P.S.A. (W) 


« weighted average of company by company estimates, 


relation of the Applications of Mechanisms 
Test with the criterion is based upon only a 
split sample but was used as though based 
upon the whole. Nevertheless, whatever error 
is involved is of minor practical significance, 
since the multiple R’s are dropped only a 
little more than one point if A. M. T. is re- 
moved entirely. 

Table 3 contains the data relating to the 
goodness of the classification job done by the 
present battery; these include the differences 
between the means of creative and noncrea- 
tive groups (X. — X,,.), the discriminant func- 
tion weights and cutting score, and the per- 
centage of correct classifications. The average 
estimate data are based upon standard scores, 
by companies, to increase comparability. 
They are presented in only summary form, 
since it did not seem practical to introduce 
a table for each company. (The previous 
comment relative to A. M. T. applies here 
also.) 

Interpretation 

The percentage of cases correctly classified 
in Table 3 through use of the pooled data 
(66°) is an undoubted underestimate, since 
it is greatly attenuated by large company dif- 
ferences. On the other hand, the percentage 
correctly classified through use of the average 





Measurement of Creativity in Machine Design 


Table 2 


Point-Biserial Correlation Between Predictors and 
Criterion Validation Sample 


Av 
Test Pooled Est 

Power Source Apparatus 

Total 13 20° 

Workable 26t 33t 
Application of Mechanisms} 19 36° 
Personal Inventory i at” 
Personal History Form 13 21° 

Multiple R 38 50 


* 5° level 
t 1% level 
Ss 


ample split 


estimate data (86%) is clearly an overesti- 
mate, since the discriminant functions them- 
selves were not cross-validated, and since the 
number of predictor variables was relatively 
large and the number of cases within a par- 
ticular company sometimes quite small. The 
writers are inclined to view the two values in 
question as crude bounds within which a high 
percentage of future estimates of the accuracy 
of the battery in classifying “creatives” and 
“noncreatives” within a company may be ex- 
pected to fall. 


Implications 

A question logically arises at this point as 
to how results obtained argue for the validity 
of structuring hypotheses. First, then, on the 
hypothesis that “creatives” and ‘“noncrea- 
tives” are not differentiated by general men- 
tal ability, we have the following evidence. 
Three tests—namely, Space Relations, Figure 
Matrices, and Number Series—could be con- 
strued as measuring component intellective 
functions. In fact, none of these did dis- 
criminate the criterion groups 

Second, on the hypothesis that age, edu- 
cation, and experience must be controlled, we 
have only the evidence that this was done 
with considerable success (7,), — .O1, .05, and 
02 respectively), and that discrimination of 
the criterion groups was still possible. 

Third, on the hypothesis that creativity is 
somewhat “specific” to the field in question, 
we have the following evidence. Of the six 


301 


Table 3 


Mean Differences and Discriminant Function Weights 
Validation Sample 


Test X¥.-X 1 


Pooled Data 


Power Source Apparatus 


Total 1.4147 OOO170 
Workable 2.5459 QOOOT1 
Application of Mechanisms 5.8153 000640 
Personal Inventory 4.9667 OOL872 
Personal History Form 2.9118 001328 


Percentage Correct 


Classifications = 66% Cutting Score OO7612 
Av. Est. Data 

Full Battery, 

Percentage Correct Range of Correct 

Classifications &O% Classifications = 60°), 100°, 


or seven ability tests involved,® three were 
quite specific to the area of machine design. 
Of these three, two are represented in the final 
battery; whereas, of the remaining three or 
four, none is so represented. 


Theory 


future investi 
some clarification of 
The first relates to the essential ne- 
ture of Creativity in machine design; and the 
second to the promising leads for its measure 
ment. The latter actually involve the “post 
mortem” hypotheses of the authors. 


obligation to 
gators to attempt 


It seems an 
two 
issues. 


On the first issue, it seems intuitively clear 
that, since the number of machine elements is 
finite, creativity in design must consist in 
finding new combinations or organizations of 
these elements. In addition, the new com- 
bination or organization produced must not 
only be “good machine design” in some abso- 
lute sense, but it must also serve some highly 
specific purpose. It may well be this latter 
characteristic, at least in part, which differ- 
entiates creativity in machine design from 
creativity in the domains of art or music or 
literature. Thus conceived, it is a problem 
solving, goal-oriented, utilitarian sort of in 
genuity which is of present concern. 


® The 


Unstructured Test is not easily classified 








302 W. A. Owens, C. F. Schumacher, and J. B. Clark 


On the second issue, it seems noteworthy 
that the associative process is tightly con- 
trolled in the test which makes the largest in- 
dependent contribution of any in the present 
battery—-the Power Source Apparatus Test. 
Here both the initial powering motion and 
the final motion desired are fixed. In “bridg- 
ing the gap” the subject can utilize as solu- 
tions only those potential mechanisms which 
are adapted to both specifications. By con- 
trast, the Design a Machine Test, of job- 
miniature type, was a relatively poor dis- 
criminator. Here, with no really controlling 
restrictions or specifications, the subject was 
simply instructed to sketch a mechanism ap- 
propriate for a particular purpose. In broad 
perspective, it seems not unlikely that maxi- 
mally discriminating ability tests in the ma- 
chine design area must involve a highly struc- 
tured context, even as the job itself involves 
one. 

Shifting from measures of maximum per- 
formance to those of typical behavior, Jacob- 
son (10) has recently completed a Wherry- 
Gaylord iterative analysis of the items in the 
Personal Inventory. His results suggest that 
“asociability” and “egocentrism” are most 
important among the several clusters of char- 
acteristics which discriminate creative from 
noncreative machine designers. Verification is 
needed, but two measurement “targets” are 
tentatively presented. 


Conclusions 


1. A battery of four measuring devices for 
the discrimination of creative from noncrea- 
tive machine designers has been developed. 

2. The concurrent validity of this battery, 
within companies, is such that it would prob- 
ably predict the correct classification of about 
three-fourths of the members of two equal 
groups of creative and noncreative designers. 


Received October 31, 1956. 


References 


1. Brozek, J., & Tiede, K. Reliable and question- 
able significance in a series of statistical tests. 
Psychol, Bull., 1952, 49, 339-341. 

2. Calvert, J. F., Hartenberg, R. S., Kliphardt, 
R. A,, Shelley, H. P., & Denavit, J. Develop- 
ing problem-solving skills in engineering. Fi- 


nal Rep., Contract N7onr-45012. Evanston, 
Ill.: Northwestern Univer., 1953. 


. Guilford, J. P. Creativity. Amer. Psychologist, 


1950, 5, 444-454. 


. Guilford, J. P., Wilson, R. C., Christensen, P. R., 


& Lewis, D. J. A factor-analytic study of 
creative thinking. I. Hypotheses and de- 
scription of tests. Rep. Psychol. Lab., No. 4. 
Los Angeles: Univer. Southern’ California, 
1951. 


. Guilford, J. P., Wilson, R. C., & Christensen, 


P. R. A factor-analytic study of creative 
thinking. II. Administration of tests and 
analysis of results. Rep. Psychol. Lab., No. 8. 
Los Angeles: Univer. Southern California, 
1952. 

Guilford, J. P., Kettner, N. W., & Christensen, 
P. R. A factor-analytic study across the do- 
mains of reasoning, creativity and evaluation. 
I. Hypotheses and description of tests. Rep. 
Psychol. Lab., No. 11. Los Angeles: Univer. 
Southern California, 1954. 


. Guilford, J. P., Kettner, N. W., & Christensen, 


P. R. A factor-analytic study across the do- 
mains of reasoning, creativity, and evaluation. 
II. Administration of tests and analysis of re- 
sults. Rep. Psychol. Lab., No. 16. Los An- 
geles: Univer. Southern California, 1956. 


. Harris, R. H. The development and validation 


of a test of creative ability. Dissertation 
Abstr., 1955, 15, 1891. (Abstract) 


. Harris, R. R., & Simberg, A. L. A C test of crea- 


tive ability. Examiners Manual. Flint, Mich.: 
General Motors Corp. 


. Jacobson, B. Clusters of responses to a person- 


ality inventory which are typical of creative 
engineers. Unpublished manuscript, Iowa State 
Coll., 1956. 


. Kruskal, W. H., & Wallis, W. A. Use of ranks 


in one-criterion variance analysis. J. Amer. 
statistical Ass., 1952, 47, 583-621. 


. Osborn, A. F. Applied imagination. New York: 


Scribner, 1953. 


. Owens, W. A., Schumacher, C. F., & Clark, J. G. 


Tests for creativity in machine design. Ex- 
aminers Manual. Ames, Iowa: Ia. State Coll. 
Res. Found., 1956. 


. Snedecor, G. W. Statistical methods. (4th ed.) 


Ames, Ia.: Ia. State Coll. Press, 1946. 


. Taaffe, G. The relation of experimental tests of 


reasoning and creative thinking to research 
performance. Rep., Contract N6onr-23810. 
Los Angeles: Univer. Southern California, 1953. 


. Taylor, D. W. Research on problem-solving 


and creative thinking. Final Rep., Contract 
N6onr-25125. Stanford, Cal.: Stanford Uni- 
ver. Press, 1956. 


. Taylor, D. W. Studies of problem-solving. Fi- 


nal Rep., Contract Nonr 225-02. Stanford, 
Cal.: Stanford Univer., 1956. 


. Thurstone, L. L. Creative talent. Rep. Psycho- 


metr. Lab, No. 61. Chicago: Univer. Chi- 
cago Press, 1950. 





Journal of Applied 


Psychology 
Vol. 41, No. 5, 1957 


Personality Dynamics and Accident Proneness in an 
Industrial Setting * 


Anthony Davids 


Brown University and Emma Pendleton Bradley Home 


and James T. Mahoney 


Boston University 


Freud’s contributions have made psycholo- 
gists well aware of the fact that unconscious 
motivations play a most important role in de- 
termining events of everyday life. It is com- 
mon knowledge, in the fields of psychology 
and psychiatry, that “accidents” are fre- 
quently not chance happenings but are linked 
in some way with dynamic factors within the 
person. Personality traits, emotions, attitudes, 
and other motivational variables, are believed 
to underlie the fact that some individuals 
seem to be unusually susceptible to misfor- 
tune, failure, and personal injury. Although 
at present there is rather wide-spread accept- 
ance of these notions regarding psychopathol- 
ogy, there has been little attempt to submit 
them to empirical verification in well-con- 
trolled experiments. 

The use of projective techniques should be 
of considerable help in studying these mat- 
ters. That is, since projective tests are de- 
signed to provide ¥Yalid information about per- 
sonality dynamics, they should be most use- 
ful in investigations of relations between these 
inner personal factors and susceptibility to 
physical injury. Therefore, in the present 
study we propose to use a projective test to 
study the influence of personality dispositions 
and attitudes on accident proneness in an in- 
dustrial setting. 


Method 
Subjects 


There were two groups of Ss, each consisting of 17 
males who were employees in an industrial plant 
The Ss in the two groups were carefully matched on 
the variables of age, education, intelligence, socioeco- 
nomic background, and exposure to high accident 

1 This study was conducted at the Process Engi- 
neering Corporation in Methuen, Massachusetts. At 
the time of the study, the first author was psycho- 
logical consultant to the firm and the second author 
was on the staff as personnel manager. 


303 


hazards. For one group (high accident), the mean 
age was 35.7 years, the mean education was 11.3 
grades, and the mean score on the Wonderlic Per- 
sonnel Test (Form A) was 24.5; for the other group 
(nonaccident), the comparable means were 35.2, 11.1, 
and 24.9. None of these differences is significant 
The critical variable that differentiated between the 
groups was the incidence of accident experience 
During a two-year period, from January 1, 1954, to 
December 31, 1955, the high accident group had a 
total of 47 accidents reported to the Industrial Acci- 
dent Board of the Commonwealth of Massachusetts 
The nonaccident group had no accidents reported 
during this same period. In other words, although 
the Ss in the two groups were matched on the types 
of jobs they were performing and all worked in the 
same physical setting,? the Ss in one group were 
characterized by high accident proneness, while the 
other group of Ss showed no susceptibility to indus- 
trial accidents. 


Sentence Completion Technique 


Description. The 100-item sentence completion 
test used in this experiment is a slightly modified 
version of an instrument constructed by Davids and 
described in connection with several previous studies 
(3, 4, 5). The test was designed originally (1) to 
secure data that would be pertinent to the problem 
of understanding the relative status of eight person- 
ality dispositions in any S. These eight dispositions 
are: (a) optimism, (b) trust, (c) sociocentricity, 
(d) egocentricity, (e) distrust, (f) pessimism, (g) 
anxiety, and (hk) resentment. Of the 100 stems, 80 
(10 for each disposition) are designed to relate to 
these dispositions. In the original version of the 
test, the remaining 20 stems were designed to be 
neutral. In the modification used in the present 
study, however, for six of the neutral stems we sub- 
stituted stems designed to indicate attitudes toward 
employment. Thus, in this form there were 10 stems 
designed to tap off associations indicative of each of 
the 8 dispositions, 6 stems designed to reveal work 
attitudes, and 14 neutral stems. These items de- 


* The setting was that of an industrial field fabri- 
cation plant concerned with the building of pressure 
vessels for oil and chemical industries. The types of 
jobs held by Ss in both groups included such classifi- 
cations as helpers, assembly workers, welders, and 
fitters 





Anthony Davids and James T. Mahoney 


Table 1 


Biserial Correlations Between High Accident vs. Nonaccident Groups and Scores on Personality 
Variables, and Comparisons of Means for the Two Groups 


Accident 
Mean 
(N =17 


Variable 


Optimism 

Trust 

Sociocentricity 

Affiliative syndrome (sum of 3 positive variables) 
Pessimism 

Distrust 

Anxiety 

Egocentricity 

Kesentment 

Alienation syndrome (sum of 5 negative variables) 
Modified alienation syndrome (sum of egocent., 


anx., and resentment) 


Negative employment attitude 


O5 level for a 1-tailed test 

O1 level for a 1-tailed test 

OO1 level for a 1-tailed test 

tically significant results shown in this table 
of the nonparametric Mann-Whitney “‘l 


* Significant 
** Significant 
*** Significant 
Note. All stati 


are tested by means 


it 
at 
at 


signed to reveal 
to their location 


work attitudes, numbered according 
in the test, were as follows 


3. I think that bosses are ——— 
0. My feeling about work is - 
25. On the job 

56. His work has been 
67. Ken finds his boss to be 
99. When I begin a new job I 


Administration The Ss were tested in small 
groups, with Ss from both the high accident and 
nonaccident groups being intermingled in the same 
testing sessions. The Ss had no idea of the purpose 
of the study and the basis for their selection was not 
revealed to them. Each S received a mimeographed 
copy of the test and instructions, then proceeded at 
his own rate of speed. The following instructions 
were received by all Ss 

“Here are a series of incomplete sentences which 
you should complete as rapidly as possible with the 
first thing that comes to mind. Usually you will 
find that a brief phrase will complete the sentence 
and sometimes you will find that a single word will 
do. You have only 20 minutes to finish this test 
and will have to work as rapidly as possible (writ- 
ing down the very first thing you think of) in order 
to complete the task within this time limit.” 

Scoring. Ten categories were utilized in scoring 
the sentence completions (eight dispositions, negative 
employment attitude, and a miscellaneous category), 


te 


High Non 


accident 
Mean 
(N =17) 


10.35 
7.59 
11.35 
29.29 
4.71 
6.76 
10.76 
6.94 
7.94 
37.12 


12.12 
941 
17.00 
38.53 
§.53 
6.71 
10.18 
5.94 
6.53 
59 


25.65 
3.06 


1.16 


5." 


are equally significant when tl 


st (6 


ie differences between the groups 


and every response was tallied under only one of 
these categories, making a total of 100 responses per 
S. First, a separate score was computed for each of 
the 10 categories. Then, for each S, the sum of his 
scores on the three positive or socially desirable dis 
positions (optimism, trust, sociocentricity) was com- 
puted, and the sum of scores on the five negative 
dispositions (egocentricity, pessimism, distrust, anx- 
iety, resentment) was computed. This cluster of so- 
cialiy undesirable traits has been described elsewhere 
(2, 4) as forming a personality syndrome called 
“alienation.” For purposes of the present study, an 
other sum was computed based on the S’s scores on 
the three negative traits of egocentricity, anxiety, 
and resentment. Thus, excluding the miscellaneous 
category, the Ss in the two groups could be com 
pared on a total of 12 personality measures derived 
from their responses to this test. 

All scoring was done “blind,” with the raters being 
unaware of the identity of the S and not knowing in 
which group he belonged. The mean percentage of 
agreement between two independent raters’ scoring 
of the responses was 90%. 

Predictions 

It was predicted that the Ss in the high accident 
group, in comparison with the nonaccident Ss, would 
score lower on the positive personality dispositions, 
higher on the negative personality dispositions, and 
higher on negative attitudes toward employment. 








Personality Dynamics and Accident Proneness 


Results and Discussion 


The results presented in Table 1! clearly 
confirm our prediction that high accident Ss 
would be characterized by relatively low 
scores on positive or socially desirable per- 
sonality dispositions. The single trait that 
differentiates most markedly between the two 
types of Ss is sociocentricity. The highly sig- 
nificant results obtained with the affiliation 
syndrome show that those Ss who had not ex- 
perienced accidents on the job were, accord- 
ing to their projective responses, more opti- 
mistic, trustful, and sociocentric. It should 
be pointed out that correlations of the magni- 
tude of — .73, which was obtained using the 
affiliation syndrome, are quite rare in studies 
attempting to relate personality measures de- 
rived from projective protocols to measures of 
overt behavior in everyday life situations. 

Although in general the high accident Ss 
tended to obtain slightly higher scores on the 
negative personality dispositions, statistical 
tests do not support our prediction of signifi- 
cant differences between the groups. On four 
of the five dispositions, however, the differ- 
ences are in the predicted direction. And on 


the triadic cluster composed of egocentricity, 


anxiety, and resentment, there is a significant 
correlation between high scores and high ac- 
cident proneness. Thus there does seem to 
be a trend, but there is no doubt that the 
negative dispositions do not differentiate be- 
tween the groups nearly as effectively as do 
the positive dispositions. This finding may 
well be attributable to the fact that the Ss 
used in this study were “normal.” That is, 
they were not hospitalized psychiatric pa- 
tients, not neurotics being seen in a clinic set- 
ting, but were working men who were seen in 
an industrial setting. It is, therefore, under- 
standable that there were not more pro- 
nounced differences between these groups on 
traits that we would expect to discriminate 
very clearly between neurotic patients and 
normal Ss. 

The highly significant coefficient (4 .70) 
obtained with the variable of negative em- 
ployment attitude, indicates that such an atti- 
tude is highly associated with industrial acci- 


305 


dent proneness. Even though the sentence 
completion test contained relatively few stems 
designed to tap associations indicative of em- 
ployment attitudes, the results show that the 
high accident Ss tend to have a negative, so- 
cially undesirable, attitude toward their job, 
work in general, and their supervisors or 
bosses. 

The over-all results of this study are quite 
encouraging and suggest the feasibility of put- 
ting projective techniques to use on problems 
such as the one reported here. Much further 
research remains to be done in the area of 
accident proneness, and there are numerous 
other industrial problems that could be studied 
fruitfully by means of projective tests. In 
addition to sentence completion techniques, it 
would seem that instruments like the Ror- 
schach and the TAT could be used to good 
advantage in attempting to gain further un- 
derstanding of relations between personality 
attributes and on-the-job performance of per- 
sonnel at all levels of the industrial hierarchy. 


Summary 


Two groups of Ss, one composed of men 
who were high on accident proneness and the 
other composed of men who suffered no acci- 
dents in the same industrial setting, were 
studied by means of a sentence completion 
test. 
indicated that the high accident Ss, in com- 


Responses to this projective technique 


parison with the nonaccident Ss, were signifi- 
cantly lower on the socially desirable person- 
ality dispositions of optimism, trust, and 
sociocentricity. In general, there were no sta- 
tistically significant differences between the 
two groups on several negative personality 
dispositions, but there was a slight indication 
of positive association between high accident 
proneness and high scores on a cluster com- 
posed of the socially undesirable personality 
dispositions of egocentricity, anxiety, and re- 
sentment. There was highly significant asso- 
ciation between high accident proneness and 
projective responses indicative of a negative 
attitude toward employment. 


Received January 15, 1957. 





Anthony Davids and James T. Mahoney 


References 


Davids, A. The influence of personality on audi 
tory apperception and memory. Unpublished 
doctor's dissertation, Harvard Univer., 1953 

Davids, A. Alienation, social apperception, and 
ego structure, J. consult. Psychol., 1955, 19, 
21-27 

Davids, A. Comparison of three methods of per 
onality assessment: direct, indirect, and pro 
jective. J. Pers., 1955, 23, 423-440 

Davids, A. Generality and consistency of relations 


between the alienation syndrome and cognitive 
processes. J. abnorm. soc. Psychol., 1955, 51, 
61-67 

Davids, A.. & Murray, H. A. Preliminary ap- 
praisal of an auditory projective technique for 
studying personality and cognition. Amer. J 
Ortho psychiat., 1955, 25, 543-554 

Mann, H. B., & Whitney, D. R. On a test of 
whether one of two random variables is sto- 
chastically larger than the other. Annu. math. 
Statist., 1947, 18, 50-60 





Journal of Applied Psychology 
ol. 41, No. 5, 1957 


The Relationship Between the Edwards Personal Preference 
Schedule Variables and the Minnesota Multiphasic 
Personality Inventory Scales ' 


Robert M. Allen 


University of Miami 


A recent paper by Merrill and Heathers 
(6) presented a correlation matrix for the Ed- 
wards Personal Preference Schedule (PPS) 
variables and the Minnesota Multiphasic Per- 
sonality Inventory (MMPI) scales for a 
group of 155 college males known to the 
counseling center of a university. The coeffi- 
cients, however, were tetrachoric r’s with the 
attendant sampling error and effect upon sig- 
nificance. 


Problem 


This study was designed to obtain the intercorre- 
lations between the PPS (2) variables and the MMPI 
(5) scales for college men and women and for both 
groups combined. Eighty-two males and forty-eight 
females in undergraduate classes at the University 
of Miami comprised the study group solely on the 
basis of being in a particular classroom during a 
given hour. No requirement other than being a 
member of the class was used. The median male 
was 22 years of age, while the median female was 
20 years old. Both were upper sophomores. Raw 
scores were obtained for the two groups of subjects 
These were converted into 7 scores as provided in 
the manuals for both tests (1). 


Results 


The mean raw and 7 scores for the three 
groups (M—Males, F—Females, T—Total) 
are within normal limits. The 7 scores for 
the PPS and the MMPI range from 41 to 59 
and from 30 to 70, respectively. The rela- 
tively high Mf 7 score of 63 for the males is 
not unique to this local college group. Good- 
stein’s (3) investigation of college students in 
various sections of the country has consist- 
ently shown Mf scale T scores in the upper 
fifties and lower sixties. 


! Completion of this project was made possible by 
a grant from the Penrose Fund, American Philosophi 
cal Society of Philadelphia, Pa. Thanks are due to 
Dr. Raymond E. Hartley, Mr. Jeffrey I. Dallek, and 
Mr. Don Lichtenstein of the Department of Psy 
chology, and to Dr. C. Lee Philips, Director of the 
University of Miami Testing Bureau, for their as 
sistance in organizing the statistical data. 


307 


Each of the fourteen MMPI scales were 
correlated with each of the fifteen PPS vari- 
ables for the male, female, and total groups. 
Of the 630 correlations in the matrix,’ 69 
were significant at the 5°) to 1% levels and 
21 at the 1% level or below. Table 1 is a 
simplified matrix of the correlation cells which 
have at least one coefficient significant at the 
5‘ level for one of the three populations. 
The coefficients omitted from Table 1 are not 
significantly different from zero correlation. 

The intercorrelations of the PPS variables 
for all three groups are shown in Table 2. 


Discussion 


The local male and female students follow 
the same general pattern of other college stu 
dents on the PPS and the MMPI (2, 3). 
When the raw scores on these two inventories 
are intercorrelated by the Pearson r procedure, 
eighteen cells of Table 1 show coefficients sig 
nificant at the level and below. ‘These 
eighteen cells contain the twenty-one correla 
tions mentioned above. (Three cells have 
two coefficients each significant at the 1% 
level.) Eleven of these involve the males. 
This is in contrast to the much higher num- 
ber of intercorrelations for the male subjects 
reported by Merrill and Heathers (6) using 
the tetrachoric method. A discussion of the 
eighteen cells follows.* 


os 
/6 


Tables giving the mean raw scores, the standard 
deviations, and the 7 scores obtained for the three 
student groups as well as the complete intercorrela 
tion matrix for the PPS variables and MMP1 scales 
have been deposited with the American Documen 
tation Institute. Order Document No. 5284 from 
ADI Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C 
remitting in advance $1.25 for microfilm or $1 
for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress 

% The PPS variables are: Ach, Achievement; Def, 
Deference; Ord, Order; Exh, Exhibition; Aut, Au 
tonomy; Aff, Affiliation; Int, Intraception; Suc, 
Succorance; Dom, Dominance; Aba, Abasement; 





Robert M. Allen 


Table 1 


P?S-MMPI Intercorrelations Significant at the 1% and 5°% Levels* 


PPS MMPI PPS MMPI 


Ach IP Exh D 
Id Pt 
Vi y. Si 


A i r 


Sc Si 


Dom 


* Italicized figures are significant at the 1° level; o 


1. Achievement—Validity (Ach—F) covary 
positively and substantially for the males. 
The shared personality dimension may be in 
the realm of the need to achieve, to yield to 
the drive for quality ambition as it covaries 
with the tendency to respond rationally to in- 
ventory items.‘ 

2. Heterosexuality 


Schizophrenia (Het 


End, Endurance; 
Aggression. Con 
Each is de 


Nur, Nurturance; Chg, Change; 
Het, Heterosexuality; and Agg, 
sistency is the final nonclinical variable 
fined in Edwards (2, p. 5) 

The MMPI scales are: 7, Question; L, Lie; F, Va 
lidity; K, Suppressor ; ITs, Hypochondriasis ; D, De 
pression; Hy, Hysteria; Pd, Psychopathic Deviate; 
Mf, Interest; Pa, Paranoia; Pt, Psychasthenia; Sc, 
Schizophrenia; Ma, Hypomania; Si, Social intro- 
version-extroversion Throughout this article Hs, 
Pd, Pt, Sc, and Ma are to be considered with appro- 
priate amount of K added to each 

‘The discussion of the underlying unities is based 
on Edwards (2, p. 5) for the PPS variables and on 
Hathaway and McKinley (5, pp. 18-21) for the 
MMPI scales 


M p ™ PPS MMPI 


22 Chg K 


WwW 


24 


18 
18 
—33 29 


Decimals are omitted 


Sc) are closely and directly related for the 


female students. Concomitant with their re- 
sponse to opposite-sex orientation, there ap- 
pears to be a reduction in their rational 
evaluation of the situation. The female group 
tends to be more emotional and less tied to 
reality than males in meeting heterosexual 
needs. 

3. Aggression—Psychasthenia (Agg—Pt) as 
a positively related pair for the female group 
reflects the conformity to social expectancy, 
i.e., women should not be aggressive, must 
not yield to their need for disparaging or at- 
tacking others. To “be a lady” at all times 
is well ingrained in college students by their 
middle-class morality. The underlying co- 
varying trait is the reaction toward persons 
and events in the social environment—the 
greater the need for aggressive expression the 
greater the guilt reaction and need for com- 
pulsive threat-reducing behavior. 





Relationship Between the Edwards PPS and the MMPI 


Table 2 


Intercorrelations for PPS Variables for the Male (M), Female (F), and Total (T) Groups * 


Dom 





310 


4. Exhibition—Social Introversion-Extro- 
version (Exh—Si) are significant inversely 
related personality dimensions for the male 
group. The underlying unity is the nature 
of the response to the social world with “out- 
goingness” and autistic behavior as the bi- 
polar extremes of the continuum. The fe- 
male students do not respond as definitively 
as the males. 

5. Autonomy—Validity (Aut—F) form a 
significant pair at the 1% level for the male 
and total groups. This suggests a moderately 
direct relationship between the need for per- 
sonal freedom and nonconformity on the one 
hand and behavior that is reflective of care- 
lessness or lack of rational restraint in re- 
sponding to situations on the other. 

6. Affiliation—Validity (Aff—F) are tenu- 
ously tied. In the Affiliation variable the 
need is for an extra-personal orientation that 
is somewhat antithetical to the lack of per- 
sonal restraint, F. The personality bipolarity 
involves the individual's willingness to chan- 
nelize interpersonal ties. 

7. Dominance—Sup pressor (Dom—K)) is a 
covarying pair for the male and total groups. 
Dominance requires little elaboration. K_ is 


described by Hathaway and McKinley as 


“ 


. defensiveness against psychological 
weakness . . .” expressed as a need to appear 
more “normal” (5, p. 18). 

8. Nurturance—Interest (Nur—Mf) yield 
significant r’s for the male and total groups. 
Nurturance symbolizes an “other person” 
orientation, a drive to be constructively use- 
‘ful to others. Mf reflects important cultural 
and intellectual influences in the life of the 
college-trained person. The essential unity 
in these two test scales is an intellectual in- 
terest in service to mankind, especially for the 
males. 

9. In Endurance—Psychasthenia (End 
Pt) the shared personality affinity may - be 
placed along a time-perseverance continuum 
with the need for closure as a method of cop- 
ing with threat inversely associated with a 
ritualistic reaction to actual or potential 
threat. 

10. Deference--Schizophrenia (Def—-Sc) 
are inversely tied, significantly for females. 
The bipolarity of the underlying trait-con- 


Robert M. Allen 


tinuum ranges from the need for leaning upon 
other persons (Deference) to the impaired 
interpersonal contacts implied in the intro- 
versive and/or autistic behavior of the Schizo- 
phrenic extreme. 

11. Order—Lie (Ord—L) is a substantially 
and positively related pair for the females. 
The behavioral dynamic is a basic defense 
mechanism best characterized as a compul- 
sion to put one’s best foot forward even to 
the extent of “gilding the lily” about oneself. 

12. Autonomy—Psychopathic Deviate (Aut 
—Pd) are significantly correlated attributes 
for the female and total groups. The trait- 
continuum involved is the primacy of one’s 
own impulses without regard for others if this 
interferes with self-satisfactions. 

13. Aggression—Interest (Age—-Mf) is an 
indirectly related pair for the female and to- 
tal populations. The need for expressing hos- 
tility is at one end of the continuum and 
varies inversely with an intelligent considera- 
tion for people. 

14. Nurturance—Psychopathic Deviate 
(Nur—Pd), as a related pair, are both logi- 
cally and psychologically antithetical to each 
other. Both involve social attitudes and the 
use of others for the satisfaction of personal 
needs. 

Four cells in the matrix of Table 1 have 
significant coefficients for the three popula- 
tion groups in each: 

15. The Order—Sup pressor (Ord—K ) cor- 
relations suggest that the mutual bond be- 
tween these variables may be the manner in 
which the individual reconciles the desire for 
regularity, organization, and integrity for de- 
tail with the need to deviate from this com- 
pulsive adherence to reality when personal 
threat is involved. This relationship is an in- 
verse one. 

16. Autonomy—Schizophrenia (Aut—Sc) 
are positively intercorrelated due to the com- 
mon personality dimension of self-centered- 
ness. 

17. & 18. Dominance—Social Introversion- 
Extroversion (Dom—Si) and Abasement 
Social Introversion-Extroversion (Aba—Si) 
may be considered together. The common 
tie is the Si scale which is described as “. 
the tendency to withdraw from social con- 





Relationship Between the Edwards PPS and the MMPI 311 


tact with others” (5, p. 21). With this clini- 
cal scale as one end of the two personality 
continua (a) Dominance forms an inversely 
related need whose manifest expression in- 
volves activity that is quite the opposite of 
the socially introversive clinical syndrome, 
and (6) Abasement, whose dynamic ingredi- 
ents spring from somewhat similar sources as 
Si, does correlate directly with Si for the three 
groups. 

Of the correlation coefficients discussed 
above, only five indicate a substantial degree 
of relationship between the PPS variables and 
the MMPI scales by virtue of their absolute 
sizes: Ach—F and Exh—Si for males, and 
Ord—L, Nur—Pd, Het—Sc for females." The 
other correlations suggest that the two tests 
probe into fairly relatively independent areas 
of personal and social living. 

In regard to Table 2, forty cells show 
substantially greater than zero relationships 
among the PPS variables. There are differ- 
ences in some need-relationships expressed by 
the male and female groups. The males as a 
group show statistically substantial correla- 
tions between these pairs of PPS needs (see 
Table 2 for the sizes and directions of the 
r’s): Def—Ach; Aff—Aut, Int, and Nur; 
Aba—Dom and Chg; Chg—Aut; Nur—Agg 
and Chg; and End—Exh. The female stu- 
dents disclose significant correlations for: 
Suc—Def, Ord, and End; Aut—Aba, Agg, 
and Het; and Agg—Def, Het, and Ord. 
These pairs of need-variables do have basic 
personality unities in common which require 
further intensive analysis. These unities ap 
pear to be sex linked. 


Summary and Conclusions 


One hundred and thirty male and female 
college undergraduates were administered the 
Edwards Personal Preference Schedule and 

5 The criteria for the degree of relationship that 


may be inferred from the absolute size of a correla 
tion coefficient are taken from Guilford (4, p. 165) 


the Minnesota Multiphasic Personality In 
ventory. The raw scores for the fifteen PPS 
variables were intercorrelated with the four 
teen MMPI scales. Three correlation ma 
trixes were obtained for the 82 males, 48 fe- 
males, and the total combined group of 130 
subjects. The resulting 630 intercorrelations 
were combined into one correlation matrix. 
From this matrix eighteen cells with at least 
one coefficient significant at the 1°% level 
were selected and presented in Table 1. 
These were discussed in detail in terms of 
their statistical significance, the definitions, 
and the communalities involved in each in- 
tercorrelated pair of personality dimensions 
Only five pairs were substantially related to 
each other. In general, the PPS and MMPI 
are fairly independent in regard to the areas 
each presumes to assess. 

The significant and fairly substantial in- 
tercorrelations among many PPS variables 
suggest a re-examination of the relative inde 
pendence that is claimed for the components 
of this inventory. 


Received December 10, 1956. 


References 


Allen, R. M. Edwards Personal Preference Sched 
ule intercorrelations for two groups 
lished manuscript, Univer. of Miami, 1957 

Edwards, A. L. Manual for the Personal Prefer 
ence Schedule. New York 
Corp., 1954 

3. Goodstein, L. D 


Unpub 


Psychological 


Regional differences in MMPI 
responses among male college students. J 
consult. Psychol., 1954, 18, 437-441 

Guilford, J. P. Fundamental statistics in psy 
chology and education (2nd ed.) New 
York: McGraw-Hill, 1950 

Hathaway, S. R., & McKinley, J. C 
Wultiphasic Personality 
Rev. ed. New York 
1951. 

Merrill, R. M., & Heathers, L. B. The relation 
of the MMPI to the Edwards Personal Pref 
erence Schedule on a college counseling cen- 
ter sample. J. consult. Psychol, 1956, 20, 
310-314 


Minnesota 
Inventory; manual 


Psychological Corp., 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


An Investigation of Dissimulation on the MMPI by Means 
of the “Lie Detector” ' 


Allen D. Calvin 
Hollins College 


and Charles Hanley 


Michigan State University 


In their recent defense of the MMPI, Calvin 
and McConnell (1) point out that deception 
on the MMPI can usually be detected by the 
special scales devised for this purpose. How- 
ever, the validation procedures used for these 
scales were relatively indirect. The present 
study was designed to attack this problem by 
a “lie-detector” technique. 


Method 


Subjects. The MMPI was administered in group 
form to approximately 300 undergraduate psychology 
students from Michigan State University. From this 
group, 13 were selected who showed evidence of dis- 
simulation. 

Nine of the 13 (7 males and 2 females) appeared 
to be “faking good.” These will be referred to as 
the Fake Good group. They were selected on the 
basis of Cofer, Chance, and Judson’s (3) key for 
detecting “positive malingering.” One S had a score 
of 23, three had scores of 20, three had scores of 19, 
and two had scores of 18. Using a cut-off score of 
10 and above with college students, Cofer et al. were 
able to correctly identify 96% of the “honest” rec- 
ords and 86% of the positive malingerers. (No 
other cut-off scores were presented.) Each S was 
matched very closely on the other uncorrected 
MMPI scales with a control S of the same sex. The 
mean Cofer et al. score for the controls was 11.7. 

Four males appeared to be “faking bad.” They 
will be referred to as the Fake Bad group. Two of 
these Ss had scores of + 4, one had a score of +1, 
one had a score of zero as measured by the F-minus- 
K index for which Gough (6) has presented cut-off 
scores for college students. (A score of zero or more 
was found in only 2% of college students, but in- 
cluded 88% of the experimental dissemblers.) Again 


! The authors wish to thank Professors Charles N 
Cofer, Starke R. Hathaway, Paul E. Meehl, and 
Ephraim Rosen for their valuable suggestions and 
criticism. Any failings of the present study which 
still remain are of course the responsibility of the 
present authors. In addition, we wish to gratefully 
acknowledge the assistance of Patricia Fritts who 
acted as an E and Elizabeth Karkanen who also 
acted as an EF and who aided in the preparation of 
the procedure section. Some of the data in the pres- 
ent paper were presented at the ‘1956 SEPA meeting 


each S was closely matched in terms of his other un- 
corrected MMPI scales with a control S of the same 
sex. The F-minus-K scores for the 

8,-—7,—-5, 4. 

Table 1 presents the mean uncorrected raw scale 
scores for the faking Ss and their controls. None of 
the differences between the two groups are significant 

Procedure. The Ss were told to report to the test 
ing room for individual sessions. The room was 
small, moderately lighted, and contained only a table, 
two chairs, and the polygraph in order to provide as 
little distraction as possible for the Ss. 

As each S began the individual session, he was told 
briefly the basic principles of the Keeler polygraph 
and informed that he was being tested to determine 
the extent to which students lie on a long personality 
test such as the MMPI. The S was further told that 
he had been randomly selected for the purpose of ob- 
taining unbiased data. 

The instructions on the front of the MMPI test- 
ing booklet were read. The instructions state that 
an § is to answer the questions “True” if the state- 
ment is true or mostly true, and “False” if the state 
ment is false or mostly false. 

After the Ss were adjusted to the machine, they 
were told they would first be asked a few “test” 
questions to which they should answer “Yes.” The 
Ss were then asked a series of questions designed to 
elicit “lies” for the later use of the polygraph opera 
tor in analyzing the data. 

The 566 test questions of the MMPI were asked 
in groups of 33-35 questions each, with a time in- 
terval of ten seconds between each question, ten to 
fifteen minute breaks being taken between the groups 

The question number and response were recorded 
by E who also noted possible outside influences such 
as coughs, laughs, and machine adjustments. The Es 
who conducted the polygraph interviews and ex 
amined the records did not know to which group 
the Ss belonged. : 


controls were 


Results 
“Lies.” Each polygraph record was scored 
for “lies” by one FE. In order to obtain a 
check on the reliability of the scoring, both 
Es independently scored one record from the 
Examiner X found nine 
All five of Y’s 


Fake Good group. 
“lies” and Examiner Y five. 





Dissimulation on the MMPI 


Table 1 


Mean Original MMPI Raw Scores for Faking Ss and Controls 


Hs D Hy Pd 


17.8 
15.8 


4.3 
3.6 


18.8 
18.5 


20.8 
21.0 


Faking 
Control 


lie notations were included in the nine desig- 
nated as lies by X. Since there are 566 ques- 
tions on the MMPI, this indicated better than 
99% agreement between the two Es.* Ex- 
aminer X consistently attributed more lie re- 
sponses to those records she analyzed, her 
mean score being 14.5 while that for Y was 
9.7. This difference is significant beyond the 
5% level. Each E had been given as nearly 
as possible the same number of Ss from the 
faking and control groups to examine. In no 
case was there a difference of more than one S 
in a particular subgroup for a given E. Each 
E analyzed her own data except in one case, 
hence this variation between Es, whether due 
to examining techniques or difference in analy- 
sis, should have been fairly well balanced out. 
The mean number of lies for each group was: 
Fake Good, 13.4 and Fake Good Control, 
11.8; Fake Bad, 9.3 and Fake Bad Control, 
12.7. No differences were significant. 

Changed Responses. The mean number of 
changed answers for each group was: Fake 
Good, 84.8; Fake Good Control, 87; Fake 
Bad, 123.5; and Fake Bad Control, 125.5. 
The difference between the faking and con 
trol groups was not significant. 

Within each group a correlation was com- 
puted between number of lies and changes in 
the validating scores (F-minus-K for Fake 
Bad and Fake Bad Control, and Cofer et al.’s 
score for the Fake Good and Fake Good Con- 
trol). The only significant correlation, .77 
(p < .05), was found in the Fake Good Con- 
trol. When the Fake Good and Fake Good 
Control groups were combined, a significant 
correlation of .65 (p < .01) was obtained. 
(Thus, a smaller number of lies was associ- 


“It should be pointed out that since “lies” were in 
a sense more important than “non-lies,” a lack of re 
liability in the judgment of “lie” responses may have 
militated against our hypotheses 


M/ Pa Pt Ma 


18.8 
15.3 


29.1 
28.4 


12.2 
11.8 


10.6 
O4 


ated with a decrease in Cofer scores for this 
group.) 

The correlation within each group between 
changes in the appropriate validating scale 
and number of changed answers was com- 
puted and the following r’s obtained: * Fake 
Bad, 48; Fake Bad Control, .57; Fake Good, 
— .72 (p< .05); Fake Good Control, .12. 
The difference between the latter two r’s is 
significant beyond the 5% level. 

In none of the groups was the correlation 
between changed answers and number of lies 
significant. 

With respect to the number of times a “lie” 
occurred on a changed answer, there was no 
significant relationship in any one group, but 
for all Ss combined, significantly fewer “lies” 
occurred on changed answers (p < .05). 

There was no significant difference between 
Es in number of answers changed by their Ss. 

Table 2 presents the changes in uncorrected 
T scores for the Fake Good and Fake Good 
Control groups. There is no significant dif- 
ference between the two groups on any sub 
scale. 

Table 3 presents the changes in uncorrected 
T scores for the Fake Bad and Fake Bad Con- 
trol groups. The Fake Bad group is signifi- 
05) on D and Mf. 

Table 4 presents the combined changes in 
uncorrected 7 scores for all Ss except the 


cantly lower (p - 


In evaluating the relationship between changes in 
the validating scales and total answers changed, it 
should be remembered that there is item overlap be 
cause the total changed answers include the changes 


on the validating scales. However, since the validat- 
ing scales can change in either direction (the final 
scale score change is the number of answers changed 
in a positive direction minus the number changed in 
the negative—thus an S might change four answers 
in a positive direction and four in a negative and end 
up with no change in his scale score), and since the 
percentage. of validating scale changes in the total 
answers changed was so slight, a correction was not 
considered appropriate 





Allen D. Calvin and Charles Hanley 


Table 2 


Changes in T Scores for Fake Good and Fake Good Controls 


Hs D 


Fake Good 1.4 


Fake Good Control 1 -2.0 


Fake Bad. The following differences between 
subscale changes were significant: Hy vs. Pt 
(p < 01), Hy vs. Sc (p< .01), Hy vs. Ma 
(p < .05), Pd vs. Pa (p < .05), Pa vs. Ma 
(p < .05). When all subscales to which K is 
normally added are compared with all sub- 
scales which do not receive the standard K 
correction, the difference is significant beyond 
the 1% level.* These last tests of significance 
were made with Sandler’s modification of the 
t test (8). 


Discussion 


We will first discuss the Fake Good group. 
The fact that their profiles (and those of their 
controls) are similar to normal college stu- 
dents ° is in accord with Cofer, Chance, and 
Judson’s finding. They report: “The results 
for the group may now be considered. Here, 
most of the differences between the honest and 
the malingered scores for the nine diagnostic 
scales are no greater than the difference shown 
by the controls between the test and the re- 
test. Only the mean D and Hs scores of the 
positive malingerers are significantly lower 
than their honest scores” (3, p. 496). Inter- 
estingly enough our Hs and D scores for the 
Fake Good group’s original test are also the 
only scales markedly lower than those of 
“normal” profiles. 


* This analysis did not include the Fake Bad group 
because while the other groups (the Fake Good and 
the two “normal” control groups) would be expected 
to be somewhat defensive and therefore show a 
greater drop on K than non-K scales, there is no ap- 
parent reason to expect such behavior on the part of 
the Fake Bad Ss, but interestingly enough their in 
clusion would not have changed the significant dif 
ference between K corrected and non-K corrected 
scales. (See Table 3.) 

5 Although our analyses used uncorrected subscales, 
since a number of the more recent studies (e.g. 5, 9) 
reported results with the K correction added, we 
made the appropriate K correction when necessary 
for comparison 


Hy 


29 ous 


Pd Mf Pa Pt 


1.6 
1.6 


1.1 ye as Ae 
5.9 9 6 


4.0 
1.7 


The failure to find a significant difference 
between the Fake Good and Fake Good Con- 
trol groups either in “lies” or number of an- 
swers changed from the standard session to 
the experimental session indicates either that 
the scale is invalid or, on the other hand, we 
may have run into an example of the Meehl- 
Rosen paradox (7). We have no adequate 
estimate of the frequency of “faking good” in 
our population and it is possible that there is 
so little of this that, even though the scale 
may be highly sensitive, the absolute number 
of defensive Ss in our population would force 
the scale to be “inaccurate.” For a fuller dis- 
cussion of this point see (7). 

Another possibility of course is that our 
polygraph operators were unable to detect 
“lies” and that the Ss were not impressed 
enough with the “lie-detector’ situation to 
answer the questions more honestly in the 
experimental situation. Cofer, Judson, and 
Weick (2) used PGR with the MMPI and 
concluded that the PGR would not prove a 
useful instrument for detection of faking. 
They did obtain a rank order correlation of 
.28 between size of deflection and K, but it 
had a probable error of .11. Since they did 
not attack the problem of malingering di- 
rectly, their study does not seem to offer any- 
thing directly pertinent to the present prob- 
lem. The observations of our Es and the 
verbal comments of our Ss seemed to indicate 
that the Ss were influenced by the ‘“lie-de- 
tector” atmosphere of the experimental ses- 
sion. Although such reports are interesting, 
they are of course not definitive. More rele- 
vant is the correlation of .65 between change 
in the Cofer, Chance, and Judson’s scale and 
number of “lies” told in the combined Fake 
Good and Fake Good Control groups. The 
fact that those Ss who told the fewest “lies” 





Dissimulation on the MMPI 


Table 3 
Changes in T Scores for Fake Bad and Fake Bad Controls 
Hs D Hy Pd 


Fake Bad 6.0 12.3 4.3 4.0 6.3 
Fake Bad Control 1.8 6.8 3.0 2.0 1.0 


also showed the greatest drop in their Cofer 
et al. scores lends support to both the “lie” 
analysis and the scale. This finding—prob- 
ably one of the most important obtained in 
the study—leads to the conclusion that while 
the Cofer et al. scale does not seem to be able 
to separate directly defensive from “normal” 
Ss in our experimental situation (as indicated 
by the lack of significant differences between 
the Fake Good and Fake Good Control group 
in terms of number of number of 
changed answers, and changes on specific sub- 


“lies,” 


scales), a change in the scale is indicative of 
a change in the honesty of the Ss’ answers. 
Although changes in Cofer et al. scores 
failed to discriminate the Fake Good from the 
Fake Good Control group, an examination of 
the relationship between changes in this scale 
and total number of changed answers indicates 
that the Fake Good group behaved differently 
While the 


from their controls in this case. 


Fake Good group did show the expected rela- 


tionship (a significant correlation of BY 
indicating that as the Cofer et al. scale scores 
went down the number of changed answers 
increased) the Fake Good Control group 
failed to show this relationship with an r of 
12. The difference between the two groups 
in this case was significant, indicating that the 
Cofer et al. scale is able to discriminate, al 
though not in the direct manner which wouk 
have been most useful to the clinician. 


In discussing the Fake Bad group the Meehl- 
will 
The interested reader 
should refer back to the discussion relating to 
the Fake Good group. There was no signifi- 
cant difference between the Fake Bad and 
Fake Bad Control groups in terms of num- 
ber of “lies” or number of changed answers. 
However, there was a signilicant difference be 


Rosen paradox is apropos again, but 
not be discussed here. 


tween the group in the expected direction in 
subscale change, the Hs and Af scale scores 
dropping significantly more in the Fake Bad 
group than in their controls. There was no 
significant difference between Ss in the Fake 
Bad group and their controls in the relation- 
ship between number of “lies” and changes in 
the F-minus-K score, total number of changed 
answers and change in the F-minus-K, and 
number of times a “lie” occurred on the 
changed answers. 

There are several other points not directly 
related to the testing of the two validating 
First, the 
fact that the subscales which normally are K 


scales which seem worthy of note. 


corrected showed a significantly greater rise 
This dif- 


ference does not seem attributable to regres- 


than the non-K corrected subscales. 
sion to the mean, because the K corrected 
scales’ T scores on the original test were not 
non-K 
This finding seems to add 


significantly different from the cor- 
rected subscales. 


support to the belief that the K correction is 


Table 4 


Changes in T Scores for All Ss Combined Except Fake Bad Group 


dD 


Change 





316 


an aid in overcoming test-taking defensive- 
ness.® 


Second, the significantly smaller proportion 


of “lies’ that occurred on changed answers 
when all the Ss were combined adds some ad- 
ditional support to our “lie” analysis. 

In conclusion it should be pointed out that 
even with our small N,’ we obtained empiri- 
cal evidence supporting the use of the validat- 
ing scales. The “lie-detector” technique, while 
no panacea for the problem of faking on struc- 
tured personality inventories, does offer a 
promising experimental approach in this area. 


Summary 


The MMPI was administered in group form 
to approximately 300 undergraduate students. 
From this group 13 were selected who showed 
evidence of dissimulation. Nine appeared to 
be “faking good” and four “faking bad.” 
These 13 Ss were each carefully matched with 
control Ss who had similar raw-score MMPI 


® Fricke (4) suggests on the basis of his work on 
response set that Ma was underweighted in the K 
correction, Hs and Pd overweighted, and that K 
should be subtracted from Hy. As Table 4 indicates, 
Ma showed the largest increase of any scale, Hs 
showed only a slight rise, and Hy showed the great- 
est drop. Pd showed the second highest rise which 
is not in accord with Fricke’s suggestion. In view of 
the fact that many subscale differences failed to 
reach an acceptable level of statistical significance, 
definite conclusions cannot, of course, be drawn; 
however, it would seem that Fricke’s hypotheses con- 
cerning the underweighting of Ma, the overweighting 
of Hs, and the subtraction of K from Hy are some- 
what supported and deserve more attention in a 
study specifically directed to this problem. 

7Studies of this nature are like the proverbial 
Mexican Army—many generals but few privates 
While there were 15,000 responses to analyze in 
both the regular and experimental sessions, the total 
N was 26. 


Allen D. Calvin and Charles Hanley 


profiles with no evidence of dissimulation. All 
Ss were again given the MMPI while being 
tested by a Keeler Polygraph. The findings 
tended to support the value of the validating 
scales, although many specific hypotheses were 
not verified in a statistically significant man- 
ner. Implications of these findings were dis- 
cussed. 


Received January 11, 1957 


References 


Calvin, A., & McConnell, J. 
inventories. J. 
462-464. 

. Cofer, C. N., Judson, A. J., & Weick, D. V. On 
the significance of the psychodiagnostic re- 
sponse as an indicator of reaction to person- 
ality test items. J. Psychol., 1949, 27, 347- 
354. 

. Cofer, C. N., Chance, June E., & Judson, A. J. 
A study of malingering on the Minnesota 
Multiphasic Personality Inventory. J. Psy- 
chol., 1949, 27, 491-499. 

Fricke, B. G. Response set as a suppressor vari- 
able in the OAIS and MMPI. J. consult. 
Psychol., 1956, 20, 161-169. 

Goodstein, L. D. Regional differences in MMPI 
responses among male college students. J. con- 
sult. Psychol. 1954, 18, 437-441. 

Gough, H. G. The F minus K dissimulation index 
for the Minnesota Multiphasic Personality In- 
ventory. J. consult. Psychol., 1950, 14, 408- 
413. 

Mech], P. E., & Rosen, A. Antecedent probability 
and the efficiency of psychometric signs, pat- 
terns, or cutting scores. Psychol. Bull., 1955, 
52; 194-216. 

. Sandler, J. A test of the significance of the dif- 
ference between the means of correlated meas- 
ures, based on a simplification of student’s ¢. 
Brit. J. Psychol., 1955, 46, 225-226. 

. Sopchak, A. L. College student norms for the 
Minnesota Multiphasic Personality Inventory. 
J. consult, Psychol., 1952, 16, 445-448 


Ellis on personality 
consult. Psychol. 1953, 17, 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


An Adjective Check List for the Study of “Product 
Personality” 


William D. Wells, Frank J. Andriuli, Fedele J. Goi, and Stuart Seader 


Rutger 


An interesting characteristic of well-known 
products is their tendency to become associ- 
ated with particular kinds of people. The 
Cadillac automobile, for example, is a national 
symbol of wealth and worldly success. Tea is 
associated with old maids and society ma- 
trons, while Coca Cola is associated with 
exuberant youth. Marlboro cigarettes, once 
considered elegant and feminine because of 
their length and filter tip, are now being con- 
verted into a he-man smoke by advertising 
which associates Marlboro with ranchers, 
hunters, and other rugged outdoor types. 

The associations which surround particular 
products are important because they influence 
sales. This influence is especially strong when 
competing products are physically very simi- 
lar (as cigarettes and cola drinks are), but it 
is often present even when clear-cut physical 
differences exist. Sometimes associations to 
specific brand names are so important that 
one brand is preferred in “blind” product 
tests, while a competing brand is preferred 
when labels are attached. 

One way to study these associations is to 
study the public mental image—the stereo- 
type—of the product user. Several techniques 
can be used to get this information, but one 
of the simplest and easiest to handle is an 
Adjective Check List. 

The Adjective Check List presented in this 
report was designed to be short enough for 
use in door-to-door surveys, and to contain 
only words likely to be understood by most 
survey respondents. It was compiled in the 
following way: A preliminary selection was 
made of all the trait names occurring 50 or 
more times per million in the Thorndike and 
Lorge word counts. The 50 or more per mil- 
lion point was selected because it distinguishes 
words “taught for permanent knowledge” in 
the first, second, and third grades (1, p. xi) 
and therefore likely to be widely understood. 
The preliminary list was edited by dropping 


University, 


Newark College 


words which seemed unlikely to describe users 
of particular products (bare, dead, evil, holy, 
mad, etc.), words which usually refer to ob- 
jects rather than people (clear, deep, dry, 
hollow, etc.), and words which are highly am 
biguous out of context (distant, free, fresh, 
promising, etc.). “Handsome,” “beautiful,” 
and “pretty” were replaced by “good look- 
ing,” an expression which can refer to either 
sex. “Masculine,” “feminine,” “high class,” 
“middle class,” “low class,’ and “old fash- 
ioned” were included, even though they did 
not meet the usage criterion, because previous 
experience had shown these words to be very 
important in designating product user stereo 
types. 

The final list, then, was something of a 
compromise between strict adherence to a cut- 
off criterion and free exercise of personal judg- 
ment. It contained the following words in 
order: friendly, strong, angry, fat, popular, 
slow, good, married, strange, patient, modern, 
secure, fair, vain, tender, comfortable, par- 
ticular, merry, small, honest, poor, natural, 
sharp, serious, masculine, different, gay, big, 
silent, gentle, young, set, cold, superior, firm, 
average, Clean, rough, simple, traveled, dan 
wise, hard, content, 
old-fashioned, successful, cheap, soft, middle 
able, warm, democratic, thin, good- 
looking, common, quiet, fancy, old, feminine, 
practical, moral, cross, kind, difficult, think- 
ing, powerful, careful, low-class, tired, impor- 
tant, foreign, interesting, little, brave, rich, 
plain, bright, weak, loud, busy, nice, original, 
happy, heavy, smart, active, correct, calm, 
curious, proud, bitter, cool, single, pleasant, 


verous, understanding, 


class, 


bad, steady, famous, religious, funny, wonder 
ful, fine, independent, smooth, ordinary, tall 
high-class, and sad. 


Method 


An adjective check list can be used in a number of 


ways, each One 


with its own assets and liabilities 


317 





318 


way is to designate the type of person to be rated 
(a Cadillac Owner, for example) and ask each re- 
spondent to check the words on the list which de- 
scribe the type. This system is clear and simple, but 
it encourages respondents to make discriminations 
only when they feel quite confident. As will be 
shown below, respondents can make meaningful dis 
criminations even when they do not believe they can 
A further disadvantage of this system is that some 
respondents are overcautious or uninterested and 
check only a few words, while other respondents, 
differently ‘motivated perhaps, check many. This 
fact makes responses difficult to evaluate statistically, 
and it makes comparisons among different groups of 
respondents somewhat questionable 

Another way to use an adjective check list is to 
designate several types of persons to be rated (a 
Cadillac Owner, a Buick Owner, and a Chevrolet 
Owner, for example) and ask each respondent to 
check the kind of person each adjective describes 
best. This system has the advantage of forcing a 
discrimination for each adjective on the list, but it 
has the disadvantage of making the judgment rela- 
tive rather than absolute. It is possible to modify 
this system by allowing omissions or multiple checks, 
but such modifications reintroduce the problems as- 
sociated with individual or group differences in will- 
ingness to respond 

Considering the advantages and disadvantages on 
both sides, it was decided that the needs of the pres- 
ent study could best be met by using forced choice 
with no multiple checks and no omissions. 

The respondents were 100 fraternity members at 
Rutgers University, Newark Colleges. The 
types investigated were those associated with well- 
known automobiles. The respondents completed the 
adjective list twice, choosing first 


stereo- 


among Cadillac 
Owner, Buick Owner, and Chevrolet Owner; then 
among Chevrolet Owner, Plymouth Owner, and Ford 
Owner. The ratings were obtained in October, 1956, 
just before the introduction of the 1957 models 

fatigue effects, the total list 
into 


To avoid systematic 
sub-lists, and the 
order. In addition, the 
columns containing the car names were systemati- 


of words was divided five 


sub-lists were rotated in 


cally varied so that each car appeared first, second, 
and third an approximately equal number of times 
The purpose of this precaution was to balance out 
the effects of any tendency to check the first, last, or 
middle column when in doubt. 


Results and Discussion 
Although results obtained from 100 frater- 
nit? members at one metropolitan college can 
hardly be thought of as characteristic of the 
consumer population, the patterns which the 
responses form do show some things about the 


way the list works. Space limitations prevent 


W. D. Wells, F. J. Andriuli, F. J. Goi, and S. Seader 


reproduction of a complete table of results.’ 
However, a list of the traits most often asso- 
ciated with the various car owners will show 
the nature of the stereotypes as they emerge. 
In the lists which follow, trait names are or- 
dered by frequency of mention. All frequen- 
cies exceed chance expectations at the .01 level 
or beyond. 

Compared with the Buick Owner and the 
Chevrolet Owner, the Cadillac Owner was 
most often described as: rich, high-class, fa- 
mous, important, fancy, proud, superior, and 
successful—as expected. He was also (in the 
minds of the college student respondents) 
cold, vain, particular, big, fat, and powerful. 
The Buick Owner stereotype was less defi- 
nite, but still consistent: middle-class, brave, 
masculine, strong, modern, and pleasant. And 
(compared with the Cadillac Owner and the 
Buick Owner) the Chevrolet Owner was called 
poor, low-class, ordinary, plain, and simple. 
Also, practical, common, average, cheap, thin, 
little, friendly, and small. 

In the Ford-Plymouth-Chevrolet compari- 
son, the Ford Owner appeared as a ball of 
fire: masculine, young, powerful, good-looking, 
rough, dangerous, strong, single, merry, loud, 
active, cool, tall, interesting, sharp, and popu- 
lar. By comparison, the Plymouth Owner 
was somewhat of a turtle: quiet, careful, slow, 
silent, moral, fat, gentle, calm, sad, thinking, 
patient, honest, understanding, and content. 
The clarity of these stereotypes is especially 
interesting in view of the fact that many of 
the respondents insisted—some quite vehe- 
mently—that they could not make discrimi- 
nations among the owners of the three “low- 
priced” cars. 

The picture of the Chevrolet Owner in 
the Ford-Plymouth-Chevrolet comparison il- 
lustrates one of the difficulties in using forced 
choice: 


1A complete tabulation of the responses to each 
adjective has been deposited with the American Docu- 
mentation Institute. Order Document No. 5319 from 
ADI Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting in advance $1.25 for microfilm or $1.25 
for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. A 
mimeographed copy of the table may be obtained 
free of charge by writing to William D. Wells, Rut- 
gers University, Newark Colleges, Newark 2, New 
Jersey. 


when one stereotype is especially 





Adjective Check List and “Product Personality” 


strong (as the Ford stereotype was in this 
case), the strong stereotype tends to drive out 
a weaker one. The Chevrolet column in this 
comparison shows only ordinary, fair, and 
common significantly above chance at the .01 
level. In such a situation, it is helpful to 
look at the words not often chosen as repre- 
sentative, although it is necessary to remem- 
ber that a word may be chosen infrequently 
because it does not fit into the stereotype in 
question, because it fits much better into a 
competing stereotype, or both. 

Words chosen infrequently to represent the 
Chevrolet Owner were: strong, vain, smooth, 
sharp, young, difficult, powerful, modern, and 
successful—all but two (difficult and modern) 
part of the Ford picture. Words chosen in- 
frequently to represent the Plymouth Owner 
were: dangerous, masculine, 


single, rough, 


319 


merry, loud, good-looking, young, popular, 
and cool. And, as might be expected from 
what has gone before, the Ford Owner was 
not looked upon as moral, plain, or sad. 


Summary 


This report presented an Adjective Check 
List designed for survey use in the study of 
“product personality.” Stereotypes associ- 
ated with the owners of well-known automo- 
biles were given as an example of the kind of 
data the list produces. 


Received December 19, 1956 


Reference 
1. Thorndike, EF. L., & Lorge, I 
book of 30,000 words 
Univer 
1944. 


The teacher's word 
New York: Columbia 


Teachers Coll. Bureau of Publications, 








Journal q40 ied Psychology 
Vol. 41, No. 5, 1957 


An Experimental Test of the Effects of “Developmental” vs. 
“Free” Discussions on the Quality of Group 
Decisions 


Norman R. F. Maier 


University of Michigan 


and Richard A. Maier 


Emory University 


The purpose of this experiment was to com- 
pare the effects of two discussion leadership 
techniques on the quality of group decision. 
One technique is a “free” discussion in which 
the leader poses the problem, then conducts 
the discussion in a permissive manner with- 
out making value judgments, but merely helps 
the group reach agreement on a solution. The 
other technique is the “developmental” dis- 
cussion in which the leader, in addition to the 
above, breaks the problem into parts so that 
all members will consider various aspects of 
the problem simultaneously. This breakdown 
should insure systematic coverage of the vari- 
ous phases and simultaneous consideration of 
each phase by the group members. 


Procedure 


The Problem 


The problem selected for this investigation 
is “The Case of Viola Burns.” ' The facts of 
the case include (a) a general description of 
Viola’s personality and intelligence, her job 
duties, her job performance, and her relations 
with others; (4) an interview between Viola’s 
boss and the personnel manager who wishes to 


promote Viola; (c) an interview between 
Viola and the personnel manager during which 
the new job is described and Viola requests 
time to consider taking the new job; and (d) 
a second interview between Viola’s boss and 
the personnel manager during which Viola’s 
behavior subsequent to the job offer is de- 
scribed. 

As the case stands, Viola is in a state of 
indecision and either should be encouraged to 


1 This case is taken from Pigors, McKenney, and 
Armstrong (4). Its adaptation for “free” and “de- 
velopmental” discussion is described in detail else- 
where (1), 


take the new job or discouraged from taking 
it. This is the problem that is put to the dis- 
cussion groups. In order to rule out the prob- 
lem of conflicting interests, the decision to en- 
courage or discourage Viola is to be made 
first from the point of view of the company, 
and then from the point of view of Viola. 

The case, as presented, makes one like 
Viola because she is cooperative, conscientious, 
and shy. Since she has shown merit on her 
present job, one is inclined to see the pro- 
motion as a reward for a deserving employee. 
Thus the case tends to produce a similar bias 
in all persons exposed to the facts. 

However, the new job is very different from 
the one on which Viola has been successful. 
It requires an extrovert, whereas Viola is far 
from this. Furthermore, the interview with 
Viola and her subsequent behavior indicate 
that Viola is disturbed by the offer. Thus a 
consideration of Viola’s qualifications and of 
the two job descriptions leads one to conclude 
that Viola would neither be happy nor suc- 
cessful on the new job. 


Subjects 


Students (undergraduates) from a course in psy- 
chology of management served as subjects. The 
case was presented during one of the weekly labora- 
tory periods and followed the text book treatment of 
the importance of matching people and jobs. Each 
laboratory group of 12 to 24 students was divided 
into discussion groups of 4 or 5 persons. The dis- 
cussion leader of each group was arbitrarily deter- 
mined by the laboratory instructor who followed a 
prearranged plan in naming the leader (e.g., member 
of a group seated nearest the instructor). 

Since the experiment was performed at about the 
middle of the semester the subjects were familiar 
with problems involving attitudes, job placement, 
and especially with group decision. All had been in 
role-playing groups in which the leader attempted to 
practice the group decision approach. 


320 





“Developmental” vs 


Discussion Styles 


The groups were arbitrarily numbered and all odd- 
numbered groups called “D” groups while even-num- 
bered groups were called “F” groups. This random 
procedure yielded 45 “D” groups of 194 persons and 
41 “F” groups of 180 persons. The instructions for 
the “D” (developmental discussion) and “F” (free 
discussion) groups are given below 

Instructions for the “free” discussion leaders were 
as follows: 


You are meeting to decide whether or not Viola 
should, in the next interview, be encouraged or dis- 
couraged about taking the new job. The case as 
presented gives all the known facts. This instruc- 
tion sheet tells you how to conduct the meeting 
In general, 


A. Try to get everyone to voice their views and 
to give reasons for their ideas. Encourage inter- 
action of ideas. 


B. Do not impose your views on the group. Be 
as permissive as you can. 


C. See if you can get agreement on the recom- 
mendations made 


D. Get a final vote on the recommendations and 
be ready to report them to the class 


Lead the discussion to decide the following 


1. From the point of view of the good of the 
company, we recommend that Viola be [discour 
aged from taking] [encouraged to take] the new 
job. ; 

2. From the point of view of Viola’s best welfare, 
we recommend that she be [discouraged from tak- 
ing] [encouraged to take] the new job. 


The first portion of the instructions for the “de 
velopmenta!l” discussion leader was the same as that 
in the instructions for the “free” discussion leaders 
After Item D, the following was inserted: 


To assist in making the final decision, obtain unani 
mous group decisions on each of the preliminary 
problems below. 


Problem 1. Develop a list of Viola’s activities on 
her present job. 


“Free” Discussions 321 


Problem 2. Grade Viola's proficiency on each with 
letters A, B, C, D, or E, and write the grade after 
the activity. 

Problem 3. Develop a list of activities Viola would 
be expected to perform on the new job 

Problem 4. Grade how well your group thinks 
Viola will do on each, 

Problem 5. Select the three activities Viola’s new 
boss will consider most important for the success 
of his office 

After this analysis, lead the discussion to decide 
the following: [The two decision issues which fol 
lowed were identical with those for the “free” dis 
cussion group] 


The instructor always met with the group leaders 
prior to the group discussions in order to clarify the 
instructions 

Groups were given as much time as they wished 
in order to come to a decision. In most instances 
half an hour was adequate. After the groups had 
completed their discussions, group members, includ- 
ing the leader, were requested to report their opin 
ions separately on a form that was supplied and 
then place them together in an envelope. Unanimity 
of opinion, therefore, was determined by the experi 
menter. The possibility that there was a difference 
between decisions based on the company’s view and 
Viola’s view also was investigated 


Results 

The effects of the two types of discussion 
processes on the quality of the decisions 
reached are shown in Table 1. It will be 
seen that fewer persons who participated in 
the “developmental” discussions wished to en- 
courage Viola to take the new job than per- 
sons who participated in the “free” discussion. 
Whether the decision considered the com- 
pany’s viewpoint or Viola’s viewpoint made 
little difference, however. For both condi- 
tions, the difference between the results of 
the “developmental” and the “free’’ discus- 


Table 1 


Effects of “Developmental” vs. “Free” Discussion Groups on Decision Reached 


“Developmental” “‘Free’ 
Discussion Discussion Dif 


Viewpoint Decision 


Interest of company Encourage 117 (60.3%) 146 (71.1%) 19.29 


Discourage 


77 (39.7%) 34 (18.9%) O01 


Interest of Viola Encourage 117 (00.3%) 143 (794%) 16.21 


Discourage 


77 (39.7%) 37 (20.6%) 001 








Norman R. F. Maier and Richard A. Maier 


Table 2 


Effects of “Developmental” vs. “Free’’ Discussion on Unanimity of Decisions 


Degree of 
Viewpoint Agreement 
Unanimous 
Split 


Interest of company 


Interest of Viola Unanimous 


Split 


sions differ significantly (.001 Lc.) according 
to chi-square analysis. 

Although discouraging Viola from taking 
the new job is regarded as the better decision, 
it will be noted that this decision is reached 
by only 39.7 per cent of the persons in the 
“developmental” groups and by a mere 18.0 
per cent of the persons in the “free” discus- 
sion groups. It is apparent that the “devel- 
mental” discussions merely weaken the ma- 
jority trend rather than reverse it. A skilled 
discussion leader, however, can actually re- 
verse the trend, but in these instances the 
leader knows the preferred decision and hence 
may be in a position consciously or uncon- 
sciously to influence the outcome. In the 
present experiment the group leaders’ deci- 
sions were in line with those of their groups 
and they never were a minority of 1 in a 
group’s decision. 

The question of whether the formalizing of 
the discussion (i.e., breaking it into sub- 
topics) required by the “developmental” 
procedure served as a disturbing or facilitat- 
ing factor in making for agreement was deter- 
mined by obtaining the ratio of unanimous 
to split decisions that resulted from the two 
types of discussion. These results are shown 
in Table 2. Although the “free” discussion 
procedure yielded a slightly higher frequency 
of unanimous decisions than the “develop- 
mental,” the difference is not significant. 
This is true for the results obtained from the 
company’s viewpoint as well as from Viola’s 
viewpoint. The trend, however, suggests that 
the difference might become significant if a 
problem having greater emotional involvement 
were utilized. In such case, the need for 
skilled “developmental” discussion leaders 
would be increased. 


“Developmental” 
Discussion 


“oe Free” 
Discussion 


29 33 
16 8 


27 32 
18 9 


Discussion 

The results obtained are in line with those 
expected in that the dynamics of the “de- 
velopmental”’ discussion lead to higher quality 
decisions than do those of the “free’’ discus- 
sions. This fact supports the view that the 
developmental type discussion should upgrade 
quality because it synchronizes thinking and 
assures systematic coverage. However, it is 
clear that it does not assure high quality de- 
cisions since the majority of the decisions still 
lacked quality. This deficiency may have to 
be corrected by increases in skill. 

It might be objected that the develop- 
mental leader is more apt to dominate the 
group. This criticism is certainly valid when 
the discussion leader has a decision in mind 
which is of higher quality than that of the 
average person. The fact that (a) the lead- 
ers were chosen at random, (0b) their deci- 
sions were no better than that of group mem- 
bers, and (c) their groups did not differ sig- 
nificantly from the free discussion groups in 
the number of unanimous decisions makes 
this interpretation untenable. 

It is possible that a different breakdown of 
the problem might alter the quality of the 
decision. However, that would require fur- 
ther experimentation. It is likewise possible 
that the developmental approach might arouse 
more resistance and suspicion, especially if 
there was considerable emotional involvement. 
This might be reflected in a decreasing num- 
ber of unanimous decisions. This, again, 
could be approached experimentally. 

The skill of the discussion leader is always 
a factor influencing both the acceptance and 
the quality of decisions (2, 3). In this ex- 
periment the skill factors were equated in 





“Developmental” vs. 


that the leaders were chosen at random. 
Since all leaders were familiar with the group 
decision method, they were by no means 
naive, but it must also be granted that they 
were by no means highly proficient. The re- 
sults obtained therefore reflect the effects of 
two group decision techniques when used with 
a limited amount of skill. The difference in 
the effects of the two discussion methods 
would undoubtedly be greater if placed in the 
hands of skilled leaders since each method 
would be diluted to a lesser degree by the 
leader's natural approach. From this it 
would follow that even greater differences 
could be expected if the experiment were re- 
peated with skilled leaders. 


Summary 


A “developmental” and a “free” type of 
discussion leadership were compared with re- 
spect to the degree to which they influence 
the quality and unanimity of group decisions. 
Small groups of college students were used as 
subjects and each group was asked to make 
a decision involving the wisdom of promoting 
a particular female employee to a new job. 
Decision. quality was not assured in this case 
because the desire to reward the employee for 
her competence on the present job incline 
people to violate basic principles of job place- 
ment. 

Seventy-six groups of either four or five 
persons were used. The leaders of thirty-five 
of the groups were asked to follow the “de- 
velopmental” discussion plan whereas the 
leaders of the other forty-one groups were 
asked to follow the “free” discussion plan. 
The instructions for the “free” discussion 
leader requested him to obtain a group deci- 
sion on whether to encourage or discourage 
the employee with respect to the new job 


“Free” Discussions 323 
offer, while the instructions for the ‘‘develop- 
mental” discussion leader requested him to 
obtain a group decision on five sub-topics be- 
fore making the final decision. 

A significant difference was obtained with 
regard to the decision reached by the two 
types of groups. The percentage of persons 
reaching the high quality decision in the “de- 
velopmental” discussion groups was 39.7 as 
compared to 18.9 for the “free.” This dif- 
ference may be expected to be even greater 
with more skilled leaders. No significant dif- 
ference was obtained with regard to the fre- 
quency of unanimous decisions in the two 
types of groups. 

It is the opinion of the writers that the 
above findings apply only to problems in 
which emotional involvement is not an im- 
portant aspect of the problem. It is believed 
that with other types of problems the “free” 
type of discussion may be more effective than 
the “developmental.” 


The superiority of the 
“developmental” discussion seems to depend 


upon two things: it assures systematic cover- 
age of the topic and it synchronizes the dis- 
cussion so that all members tend to talk about 
the same thing at the same time. 


Received November 29, 1956. 


References 


Maier, N. R. F. Principles of human relations. 
New York: Wiley, 1952 

Maier, N. R. F. The quality of group decisions 
as influenced by the discussion leader. Hu- 
man Relat., 1950, 3, 155-174 

. Maier, N. R. F., & Solem, A. R. The contribu- 
tion of a discussion leader to the quality of 
group thinking: the effective use of minority 
opinions. Human Relat., 1952, 5, 277-288. 

. Pigors, P. J. W., McKenney, L. C., & Armstrong, 
T. O. Social problems in labor relations. 
New York: McGraw-Hill, 1939 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


Estimation of the Reliability of Average of Rankings 


Harold A. Edgerton 


Richardson, Bellows, 


A common practice in obtaining an order of 
merit of persons is to rank them. Replication 
of the rankings and averaging of the ranks 
assigned to each individual is done as a way of 
improving the quality of the estimated order 
of merit. ‘There are variations of the method, 
such as using ratings requiring a predetermined 
distribution. 

Such estimates, to be useful, must have 
sufficient stability or reliability. This note 
shows a simple method of estimating the 
reliability of such averaged values. It is neces- 
sary that the individual rank series or ratings 
by each ranker or rater have equal means and 
equal standard deviations. 

Let us assume that there are # sets of ranks, 
each rank being expressed as a deviation from 
its mean. The standard deviation of the 
averaged ranks of N persons by  rankers is 


(* + x2+--: 
n 


N 


o= NL 


Expanding this expression, 


> xe + > x? 12. of > 2," 
+ 2(d- xix + Yoxitst have (2) 
+ xn 1Xn) 
Since the standard deviations of all the com- 
ponent sets of rankings are equal, each may be 
denoted by o,, then 


n?*N 


a. 
= ” [n+ 2(riotrist >> -+rn-yn)] (3) 


Let * be the average of the correlations 
among the several component sets of rankings, 
then 


4 a, 
o , [a + n(n—1); | (4) 
n 


Henry & Co., Inc. 


Solving for 7, 


a=] (9) 

To show how this function may be used, a 
recent study may be cited. In this study, four 
rankings were obtained for each of 86 first-line 
foremen. Each series of ranks was converted 
to “rank scores” with a mean of 30 and 
standard deviation of 10. The observed 
standard deviation of the averaged ranks was 
8.34. By using Equation 5, the average inter- 
correlation among the rankings was computed. 


(8.31)? 
Ww 1.7622 


—  -. 
This is the average value of the reliability of a 
single set of rankings. Applying the Brown- 
Spearman Prophesy formula, the estimated 
reliability of the averaged ranks is 0.850. 

This function seems particularly useful in the 
situation which is commonly found when using 
averaged ranks or rank scores as a criterion of 
on-the-job performance for validating selection 
tests. As a rule, no one rater knows the 
performance of all foremen wel] enough to rank 
them with confidence or reliability. It is 
possible to obtain several rankings for each of 
the foremen by subdividing them among rank- 
ers who are competent to rank them. ‘This 
arrangement does not permit the usual method 
of computing reliability, but does meet the 
conditions required by the formula derived 
above, when all subjects have been ranked by 
the same number of rankers. 


Received January 30, 1957 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


A Comparison of Two Modes of Prosthetic Prehension 
Force Control by Arm Amputees *’ 


Hilde Groth and John Lyman 


University of California, Los Angeles 


The rationale for the evaluation of func- 
tional regain in prosthetic systems for arm 
amputees is based on the primary purpose of 
a prosthesis—namely, to provide a functional 
substitute for the lost limb. In an earlier ex- 
periment (5) time of performance was used 
as a criterion for comparing “voluntary open- 
ing” (VO) and “voluntary closing” (VC) 
principles of prosthetic terminal device con- 
trol. This criterion measure failed to differ- 
entiate between the two modes of control. 
However, the empirical evidence could not be 
considered complete for refuting the view that 
voluntary closing mechanisms are to be pre- 
ferred for prosthesis prescriptions as two addi- 
tional arguments brought up in favor of this 
system (2, 4) bear directly on its adequacy 
as a functional substitute. The arguments 
are: 


1. VC provides variable prehension force 
permitting the amputee to grade his output 
force according to task requirements. 

2. VC devices provide a wider range of use- 
ful forces since unlike VO devices their maxi- 
mum force is not limited by the spring ten- 
sion that keeps the device closed. 

The object of the present study is to test 
the validity of these two arguments. In con- 
nection with the first argument it is our ma 
jor experimental hypothesis that prehension 
force can be adjusted equally well with VO 
and VC mechanisms since the same control 
motions will result either in holding back 
against the spring force in the VO device or 
in active closure of the VC device. Our hy- 


! This work was supported by Contract No. VAm 
23110 between the Veterans Administration and the 
Department of Engineering, University of California, 
Los Angeles. The opinions expressed are those of the 
authors and do not necessarily represent those of the 
Veterans Administration 

2 The writers wish to thank Dr. C. L. Taylor, Proj 
ect Leader, for his advice and recommendations dur- 
ing this investigation. We are also indebted to Mr 
J. R. Zweizig for the development, calibration and 
maintenance of the instrumentation. 


325 


pothesis with respect to the second argument 
is that in real life situations the additional 
utility of a wider range of forces available in 
VC devices is not of practical significance to 
the amputee. 


Procedure 
Subjects 


Twenty locally resident arm amputees were di 
vided into twa groups. Group I consisted of ten 
regular wearers of the VO device and Group II of 
ten regular wearers of the VC device. Each of these 
groups consisted further of five Ss whose left arm 
had been amputated and five whose right arm had 
been amputated. For the tests each S wore the type 
of device to which he was accustomed in daily life 


Apparatus and Tests 


A left and a right Northrop-Sierra two-load hook 
(VO) and a left and right APRL hook (VC) were 
instrumented with Baldwin-Lima SR-4 strain gages 
These gages were mounted on the immovable finger 
of each device and calibrated in terms of tip prehen- 
sion force. A strain gage analyzer and Brush oscillo 
graph combination permitted continuous recording of 
prehension force (1). 

The APRL hooks were modified by removal of the 
locking mechanism in order to exclude mechanical 
artifacts. This made it possible to determine the dif- 
ferential effects due solely to the mode of operation, 
ie., “pull-to-open” or “pull-to-close.” 

Very simple manipulation and holding tests were 
used in order to minimize practice effects. The ma- 
nipulation tests were: thirty blocks of the Minnesota 
Rate of Manipulation Placing Test (MRMT) ; pick 
up, transport, and release of ten drinking straws; 
pick-up, transport, and release of twelve paper cups 
(Lily cups). The holding tests required a stationary 
hold for a three-minute period of the. following ob- 
jects: a block of the MRMT; a drinking straw; a 
paper cup; a hard-cover composition book. As with 
the earlier investigation we attempted to use tasks 
which seemed “realistic” to the amputees and would 
facilitate motivation as well as provide the necessary 
empirical data for testing our hypotheses. Figure 1 
gives an over-all view of the tests and the work 
space. 


Routine 


The Ss were naive as to the purpose of the experi 
ment. All tests were administered in a Subject by 





Hilde Groth and John Lyman 


—— _/ “"Btbep 43 


| LL ay ay 


Fic. 1. Work space and tests. 


Treatment randomized block design after standard 
instructions had been read to them. The manipula- 
tion tests were administered with S standing. For 
the holding tests S was sitting with his upper arm 
parallel to his side and his elbow flexed at a 90° 
angle. Objects were held so that torque from gravi- 
tational force did not become a prehension force 
artifact. 

After completion of the tests each S was inter- 
viewed, The interview was recorded on a tape re- 
corder. The Critical Incident Technique (3) was 
used and the interview was structured in such a way 
as to obtain actual events in an amputee’s daily life 
in which he used or did not use his terminal device 


(COLL APSE 


| 
— oo 


PREMENSION FORCE (KG) 
° . 
> ° 


¢ 
* 


INDENTATION 


! 
one wooo cocece4S 


ee | 











L 





wan 
BLOCK 


Fic. 2. 
for ten VO and ten VC hook wearers on manipula- 
tion tests. 


Mean prehension forces and variabilities 


Results * 


All statistical comparisons were made by 
means of nonparametric tests (6). The sig- 
nificance level was set at P= .01. The re- 
sults were as follows: 


1. Comparison of the mean prehension forces 
between VO and VC hook wearers on the 
three manipulation tests indicated that there 
was no significant difference in the absolute 
level of force exerted. This result is shown 
in Fig. 2. 

2. There was no statistically significant dif- 
ference between the mean prehension forces 
used by VO and VC hook wearers as indi- 
cated by the holding tests. This result is pre- 
sented in Fig. 3. 

3. Comparison of manipulation and _ hold- 
ing tests with respect to difference in prehen- 


----5 
' 


| COLLAPSE 


x 


PREHENSION FORCE (KG) 


INDENTATION 


a. 

' ' 

og 
s 


























wre sTRaw cue 800K 
Fic. 3. Mean prehension forces and variabilities for 
ten VO and ten VC hook wearers on hold tests. 


8 Detailed statistical tables have been deposited 
with the American Documentation Institute. Order 
Document No. 5318 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress, Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to Chief, Photoduplication Service, 
Library of Congress. 





Prosthetic Prehension Force Control by Arm Amputees 


sion force for each object did not reach sig- 
nificance except for the MRMT block. In 
this case the VO wearers allowed the block to 
be held with maximum spring force during the 
holding tests while during the manipulation 
test on the MRMT they “held back” and 
exerted submaximal force. Figures 4 and 5 
illustrate these results. 

_ 4. Assessment of the interviews indicated 
that the greater range of forces available to 
VC wearers was of no practical value. All 
amputees in the sample, regardless of the 
mechanism they wore regularly, preferred 
commercially available VO steel hooks for 
heavy work, adding rubber bands for greater 
This result 
must be qualified by pointing out that from 
the comments made it was apparent that this 
choice was at least partly due to fear of dam- 
age to the aluminum construction of the most 
commonly available VC hook. Four of the 
VC wearers claimed they had damaged their 
hook by springing the aluminum fingers. 


prehension force as necessary. 


wOLO 
——=—— MANIPULATION 


PREHENSION FORCE (KG) 
T 
' 


a 
ne Owe eeseesce of 


- 
sis 
wrwr cur 
8.OCcK 














Fic. 4. Comparison of mean prehension forces and 
variabilities between hold and manipulation tests for 
VO hook wearers 


HOLO 
== MANIPULATION 


PREHENSION FORCE (KG) 














MRMT STRAW 


BLOCK 


Fic. 5. Comparison of mean prehension forces and 
variabilities between hold and manipulation tests for 
VC hook wearers 


Discussion 


Neither of the two arguments based on con 
sideration of prehension force for favoring the 
VC principle has been borne out by our in- 
vestigation. It appears that VO device wear- 
ers are just as capable of grading their pre- 
hension force according to the task as VC 
device wearers. This was true for both ma- 
nipulating and holding the test objects. In 
no case did an amputee damage such crush- 
able objects as straws and paper cups by using 
too much prehension force or drop the object 
because of too little prehension force. The 
experimental results seem to indicate clearly 
that for practical purposes the underlying 
principle of operation of his terminal device 
is not particularly relevant to the amputee. 

With respect to the value of having a wider 
range of forces available in the VC hook there 
is little supporting empirical evidence. The 
VO wearers were generally satisfied with the 
amount of tension available and the VC wear- 





328 


ers did not utilize the potentially wider range 
out of fear of damaging the device. 

The results of this investigation are thus in 
full agreement with the previous study using 
a performance time criterion (5) and indicate 
that further functional improvement in future 
terminal device development probably must 
be sought in mechanism simplicity, durability, 
and reliability independently of the underly- 
ing principle of operation. 


Summary 


This study describes an experimental evalua- 
tion of the mode of control of prosthetic ter- 
minal devices in terms of prehension force as 
the performance criterion. 

Twenty unilateral below-elbow amputees, 
ten of whom were regular VO hook wearers 
and ten regular VC hook weakers, served as 
subjects. Simple performance tests requiring 


grasping and transport of light objects were 
employed to study the effects of voluntary 
opening versus voluntary closing mechanisms. 
Supplementary information was obtained 
from each amputee by a tape-recorded inter- 
view. 
The results showed that amputees are ca- 


pable of controlling the amount of prehension 


Hilde Groth and John Lyman 


force precisely enough with both types of de- 
vice to avoid crushing objects such as straws 
and paper cups. No statistically significant 
difference was found in the absolute amount 
of prehension force exerted on a particular ob- 
ject in a specified task by a VO or a VC hook 
wearer. 

Assessment of the interviews failed to sup- 
port the hypothesis that the VC wearer makes 
use of the wider range of forces which is in- 
herent in his device. 


Received February 25, 1957 


References 


. Anonymous. Two channel strain gage indicator 
and recorder. Rep. 3457, Dept. of Engng, 
Univer. of California, Los Angeles, 1947. 

. Fishman, S., & Berger, N. The choice of terminal 
devices. Artificial Limbs, 1955, 2, 2. 

. Flanagan, J. C. The critical incident technique. 
Psychol. Bull., 1954, 51, 327-358. 

. Fletcher, M. J. New developments in hands and 
hooks. In P. E. Klopsted (Ed.), Human 
limbs and their substitutes. New York: Mc- 
Graw-Hill, 1954. 

5. Groth, Hilde, & Lyman, J. Relation of the mode 
of prosthesis control to psychomotor perform- 
ance of arm amputees. J. appl. Psychol., 1957, 
41, 73-78. 

Walker, H. M., & Lev, J. 
New York: Holt, 1953 


Statistical inference. 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


Rate of Force Application in a Simple Reaction Time Test ' 


Edmund T. Klemmer * 


Operational Applications Laboratory 
Air Force Cambridge Research Center 


Reaction time (RT) may be defined simply 
as the time interval between the onset of a 
stimulus and the response to that stimulus. 
The stimulus onset is usually sudden and well 
defined in time, but the response onset is 
not. Whether by statement or implication, a 
manual response is usually defined in terms of 
a given required magnitude of force on, or 
displacement of, a device in contact with the 
S’s body. A certain minimum magnitude of 
response must be defined in order to differ- 
entiate the intended response from small in- 
voluntary movements. Since force and _ its 
time integrals are built up in the response 
over time, it follows that the RT will be a 
direct function of the required magnitude of 
the response. 

The present experiment concerns the rela- 
tion between RT and response magnitude with 
a common type of response: key pressing. 
Specifically, the tests are designed to deter- 
mine the rate at which force is applied to a 
pressure RT key under several different con- 
ditions. The results show the changes in RT 
which can be expected as a function of changes 
in force required of the response. Various 
levels of pre-stimulus holding force are also 
studied systematically for their influence upon 
RT. 

Method 
Apparatus 

The S's response key was attached to a Statham 
strain gage (Model GI-48-675). The complete key 
had a stiffness of .0002 inch per ounce with a fixed 
mechanical stop at 45 ounces. A center-zero micro 
ammeter was located just above the key and gave S 
immediate knowledge about the force which he held 
on the key. The strain gage bridge circuit was offset 


1 This research was performed at the Operational 
Applications Laboratory, Air Force Cambridge Re- 
search Center, Bolling Air Force Base, Washington, 


D. C. The present paper is essentially the same as 
AFCRC-TR-55-1, dated June 1955. Appreciation is 
expressed to Mr. John R. Schjelderup who designed 
the apparatus. 

* Now at IBM Research Center. 


to such an extent that when S held the desired force 
on the key his meter would read zero . 

A hole was drilled in the microammeter immedi- 
ately below the zero position of the needle and a 
frosted NE 50 neon lamp was mounted behind this 
hole. This neon lamp served as the stimulus in all 
tests. The stimulus remained on until after the re- 
sponse. 

As soon as S produced the additional force re- 
quired of the response, a click sounded on S’s board 
to signal a satisfactory The pressure key 
itself was silent. A warning buzzer was also mounted 
on the keyboard 

The E 


tronic 


response 


was located in another room with two elec- 
triggers connected to the strain gage bridge 
The triggers were activated when the force on S's 
key reached levels chosen by EZ. One Standard Elec- 
tric timer, A, would start at the onset of the stimulus 
and stop upon the firing of one of these electronic 
triggers. Another Standard Electric timer, B, started 
on the firing of the first trigger and stopped upon the 
firing of the second trigger. The firing of the second 
trigger sounded the click on the S’s board, signifying 
a Satisfactory response 

In addition to the trigger circuits, the D. C. strain 
gage amplifier output was connected to the plates of 
a cathode ray oscillograph which was fitted with a 
continuously moving film camera 


Subjects 


The Ss were two staff psychologists and four col- 
lege undergraduatés” All of the Ss had previous 
training in visual reaction-time tests and three had 
served in preliminary experiments on the force key 
apparatus 


Procedure 


Two experimental variables were changed between 
runs: (a) the force which S was required to hold on 
the key before stimulus onset; (b) the additional 
force required to produce the click signaling a satis- 
factory response. The S was always informed with 
respect to the values of holding force and additional 
force required for the response. All Ss were given 
practice trials to acquaint them with the range of 
holding force and additional force required. Correct 
holding force was always indicated by zero position 
of the microammeter and sufficient response force by 
the click. All Ss were able to maintain the correct 
holding force within 1 ounce even for the test requir- 
ing 20 ounces. 

In all, five different pre-stimulus holding forces 
were used and two different response forces with each 








Edmund T. Klemmer 


Table 1 


Description of Tests and Mean Clock Times of 120 Trials on Each of 6 Ss in Each Test 
(Numbers in parentheses are standard deviations calculated from averaged variances of runs of 40 responses.) 





Added 
force req’d 
(0z.) 


Holding 
force 
(oz.) 


RT to first added oz. 


1 req’d 
(msec.) 


RT to add 
20 oz. 
(msec.) 


Rise time 
1-20 oz. 
(msec.) 


20 req'd 
(msec.) 





1 
20 
1 


20 
5-1 : 1 
5-20 : 20 
10-1 1 
10-20 20 
20-1 1 
20-20 20 


Mean 


Mean excluding test 0-20 


168 (29) 


169(29) 


169(23) 200 31*(9) 


166(30) 


168 (27) 209 41(9) 


168 (27) 


164 (26) 38(7) 


168 (29) 


167(27) 39(8) 


168 (30) 


169(29) 
168 (27) 


39(12) 
37(9) 
39(9) 


* Significantly different from other times in same column (see text). 


holding force. Column 2 of Table 1 gives the hold- 
ing force and Column 3 the added force required for 
each of the 10 tests. 

Each run of each test consisted of 40 stimulus 
presentations spaced roughly ten seconds apart. The 
warning buzzer was sounded briefly before each 
stimulus with irregular foreperiods between one and 
two seconds. Each S was given test runs in the fol- 
lowing balanced design. Three Ss started with Test 
0-1 and took one run on each of the ten tests in the 
order listed in Table 1, then repeated one run on 
each test in inverse order, and finally repeated each 
test a third time in the original order. The other 
three Ss began with Test 20-20 and made their first 
10 runs in inverse order, the second 10 in the order 
of Table 1 and the final 10 runs in inverse order 
again. Thus, each S had a total of 120 stimulus 
presentations on each test 

The electronic trigger which stopped Clock A and 
started Clock B was always set one ounce above the 
holding force. Thus, Clock A always recorded the 
time between stimulus onset and one added ounce of 
force on the key. The second trigger which stapped 
Clock B and sounded the response click on S’s board 
was set at either 1 ounce or 20 ounces according to 
the additional force required of the response as shown 
in Table 1. 

Oscillograph records were taken on six out of 
every 40 trials. The oscillograph was not available 
until two Ss had completed their runs so that the 
film records represent only four of the six Ss 


Results 


The times from the onset of stimulus to the 
first added ounce of the response are given in 
Columns 4 and 5 of Table 1. The times to 
add 20 ounces, when required, are given in 
Column 6 of Table 1. All time entries in the 
body of the table are means of 720 responses, 
120 for each of six Ss for each test. 

Note that the RT measured to the first 
ounce of the response (Columns 4 and 5) is 
just about the same for all 10 tests. The 
highest (169 msec.) is only 5 msec. different 
from the lowest (164 msec.). This shows that 
the time required to add one ounce is nearly 
independent of the holding force and also in- 
dependent of the total additional force re- 
quired of the response. These findings are 
corroborated by the film record analysis be- 
low. 

The time taken to go from 1 added ounce 
to 20 added ounces is given in Column 7. For 
tests with more than zero holding force this 
rise time is constant, with no tests being more 
than two milliseconds from the mean of the 
four tests. The test with zero holding force 
(Test 0-20) showed an 8 msec. faster force 
rise than the mean of the other four tests. 





Force Application in Reaction Time Test 


TOT FORCE ™ OUNCES 


TE IN MSEC FROM STIMULUS ONSET 


Fic. 1. Rate of force application curves for each 
of five tests with different pre-stimulus holding forces. 
Data averaged on time axis from 18 trials for each 
of four Ss. 


Its rise time of 31 msec. is significantly dif- 
ferent from the next fastest test (Test 5-20) 
at the .01 level for five of the six Ss on the 
basis of a ¢ test of the difference between 
mean RTs. 

Figure 1 shows mean force rise curves for 
each test requiring a 20-ounce response and is 
based on the available film records (four of 
the six Ss; six of each 40 trials). Note that 
all five curves of Fig. 1 leave the base line at 
about the same point along the time axis. 
This agrees with the RT’s measured by the 
l-ounce trigger which failed to differentiate 
among tests (Table 1). 

The slopes of the force curves of Fig. 1 are 
about the same for all tests except for the test 
using zero holding force (Test 0-20) which 
rises a little more steeply than the others. 
This bears out the clock times which indi- 
cated a steeper rise for this test. 

In order to get some idea about individual 
differences in rate of force application, the 
available film records were averaged over 
trials and tests for each S separately. Only 
the five tests requiring the 20-ounce response 
were averaged since the other tests often re- 
sulted in responses of only a few ounces. The 
rate of force application curves for each of 
the four Ss who had film records are presented 
in Fig. 2. Figure 2 shows considerable differ- 
ences in slope among the four Ss. Analysis of 
the clock times between the 1- and 20-ounce 





4009 FORCE ® OUNCES 








0 200 280 
Tie 16 WOE” F hOm STemULUS ONSET 
Fic. 2. Rate of force application curves for each 
of four Ss. Data averaged on time axis from 18 
trials on each of the five tests with 20-ounce trigger 
The pre-stimulus holding force (not shown) varied 
from zero to 20 ounces over tests 


triggers for these same Ss shows that each S 
produced a significantly different rate of force 
application from each of the other Ss with the 
exception of C and D who have almost equal 
rise rates. 

Figure 2 also shows that the Ss with slower 
rate of force application also leave the base 
line of holding force later. In other words, 
the Ss who are slow in starting are also slow 
in rate of force build-up. This conclusion is 
further borne out by the correlation across Ss 
between the mean RTs for the 1-ounce trig- 
ger and time to add an additional 19 ounces, 
These correlations vary from + 0.34 to + 0.70 
over the five tests. 

We may now ask about the relation be- 
tween starting speed and rate of force build- 
up within each S’s data separately. This re- 
lation was determined by correlating the RT 
as measured by the l-ounce trigger with the 
time to add an additional 19 ounces. The 
product-moment correlations were determined 
separately for each 40-trial run, converted by 
Fisher’s z transformation, averaged, and then 
reconverted to correlations. The combined 
correlations were all small and negative. The 
range over tests (combined Ss) was — 0.17 
to 0.26. The range over Ss (combined 
tests) was —0.12 to —0.30. This simply 
means that within any one S there is some 
tendency for slower starts to be accompanied 
by faster force build-ups. 

Finally, the trial-to-trial variability is given 
by the set of standard deviations of clock 
times appearing in parentheses in Columns 4, 








332 


5, and 7 of Table 1. These SDs are based on 
averaged variances from each run of 40 trials. 
Note that the range of SDs of RTs for the 
l-ounce trigger among the 10 tests is only 23 
to 30 msec. (Table 1, Columns 4 and 5). 

The SDs of the rise times for the five tests 
having both 1- and 20-ounce triggers appear 
in Column 7 of Table 1. Test 20-20 has a 
SD of rise time for all Ss which is 3 msec. 
above any of the other tests, but this -high 
value is due entirely to only two of the six 
Ss who showed high variability in this par- 
ticular test. Thus, there is no convincing 
evidence of the effect of holding force on the 
variability of either the RT measured to the 
first ounce of response or the rise time to 20 
added ounces. 


Discussion 

The most striking finding in this study is 
the constancy of the rate of force applica- 
tion curves under widely varying conditions. 
Varying the amount of force S was required 
to hold on the key previous to the stimulus 
had little or no effect upon the mean RT 
measured to the first ounce of the response 
nor upon the variability of this RT. The RT 
measured to the first ounce of the response 


was the same*when a total response of 20 
ounces was required as when only 1 ounce 


was required. Also the rate of force applica- 
tion between 1 and 20 ounces was not much 
affected by the pre-stimulus holding force, al- 
though the zero holding force condition did 
result in a statistically significant faster force 
build-up. 

It should be emphasized that the maximum 
rate of force application found in these tests 
applies to the reaction time situation only. 
These same Ss could apply force at a much 
higher rate than the 500 to 1,000 ounces per 
second indicated in Fig. 2 if the instructions 
were merely to apply force as rapidly as pos- 
sible. 

It should be remembered also that the force 
measured in the present tests was a fairly di- 
rect expression of the immediate muscular 
force since there was little or no acceleration 


force involved. If the situation is one in 


Edmund T. 


Klemmer 


which there is movement of the S’s hand or 
arm, muscular forces are first expressed as an 
acceleration of the limb and later reconverted 
to force by impact on an external control. 
The rate of force application on the external 
control under these circumstances can be ex- 
tremely high, but a penalty is extracted in in- 
creased RT. In the same way, an increase in 
RT can be expected if a response of consid- 
erable displacement is required. A displace- 
ment response will involve muscle dynamics 
which are different from the pressure key 
situation so that the present results cannot be 
applied directly to moving response levers. 


Summary 


An electrical strain gage was fitted to a 
pressure key and continuous force records 
were taken during a simple reaction-time 
experiment. Various levels of holding force 
previous to stimulus onset were required of S 
with two widely different amounts of addi- 
tional force required for the response. 

The results of this study may be summa- 
rized as follows: 


1. RT measured to the first ounce of the 
response is independent of pre-stimulus hold- 
ing forces of zero to 20 ounces. 

2. RT measured to the first ounce of the 
response is the same when a 20-ounce re- 
sponse is required as when only a 1l-ounce 
response is required. 

3. Rate of force application between 1 and 
20 added ounces is independent of holding 
forces from 2 to 20 ounces. Zero holding 
force results in a slightly higher rate of force 
application. 

4. The mean rate of force application for 
six Ss such that 37 msec. are required to build 
up from 1 added ounce to 20 added ounces. 

5. Between Ss, starting speed of the re- 
sponse is positively correlated with rate of 
force build-up. Within Ss, starting speed is 
negatively correlated with rate of force 
build-up. 

6. Variability of RT is unaffected by pre- 
stimulus holding forces of zero to 20 ounces. 


Received February 28, 1957. 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


Positive and Negative Faking on a Forced-Choice 
Authoritarian Scale ' 


Walter A. Kaess and Sam L. Witryol 


University of Connecticut 


A serious contamination in the evaluation 
of scores obtained from group personality 
tests has resulted from the problem of faking. 
For various reasons testees frequently feel 
motivated to make themselves “look good,” 
and this bias seriously confounds problems of 
diagnosis and selection (7, 18). The general 
problem becomes especially critical when test 
items are too obviously transparent, and for 
this reason forced-choice questionnaires have 
received considerable attention. The forced- 
choice test form represents an attempt to 
minimize faking (2, 5, 8). The present in- 
vestigation was designed to test the effects of 
faking, “looking bad” as well as “looking 
good,” upon a forced-choice scale of authori- 
tarianism recently published by Jones (10). 
This instrument appears to have relevance to 
the study of civilian, as well as military, lead- 
ership; it may provide a bridge between the 
widely discussed and investigated concepts 
generated from the California F Scale in The 
Authoritarian Personality (1) and selection of 
potential leaders in business and industry. 

Jones tried to resolve two limitations of the 
F Scale as a personality measure: contamina- 
tion with political attitudes and sensitivity to 
response set or fakability (4,15). His forced- 
choice questionnaire, the Pensacola Z Scale 
contains items more congruent with person 
ality dimensions, rather than with the liberal- 
ism-conservatism political dimension. It also 
represents an important effort to minimize the 
testee’s tendency to make a favorable impres- 
sion. The present paper deals with four prob- 
lems: (a) the susceptibility of this forced- 
choice scale to faking, “looking bad,” as well 
as “looking good”; (4) the construct validity 
of the Pensacola Z, the correlation with vari- 
ables conceptually related to authoritarianism ; 
(c) the effects of positive and negative faking 


! This work was supported by Contract Nonr-1886 
(00) between the Office of Naval Research and The 
University of Connecticut. 


upon the construct validity of this test; and 
(d) theoretical issues in the 
form of questionnaire, 


forced-choice 


Fakability of forced-choice instruments has 
received considerable attention in the recent 
research literature. In the Jurgensen Classifi 
cation Inventory, it has been demonstrated 
that when subjects were instructed that self 
confidence was being measured, they were able 
to falsify scores on that factor (12, 13). This 
particular inventory was constructed so that 
choices were forced among item groups homo 
geneously paired in either socially acceptable 
or socially undesirable categories. When in 
structions directed subjects to make a good 
impression without knowledge of the factor 
measured, the mean did not increase 
compared to the mean score under con- 
ventional administration, although individual 
scores frequently changed. Furthermore, mean 
scores were essentially the same under simu 
lated industrial 
pared to a 
once more 


score 


instructions com- 

vocational guidance set, but 
individual changed (12). 
Another forced-choice personality scale—the 
Gordon Personal Profile 


selection 
scores 


was administered to 
high school subjects for guidance purposes, 
and later, for job applications; individual 
scores changed moderately (8). Essentially 
the same results were obtained in a study em 
ploying college students under simulated con 
ditions similar to those reported in the preced 
ing investigation (14). 

The research examples cited illustrate that 
various conditions of set and motivation in 
fluence scores obtained with the forced-choice 
technique, although it may be inferred that 
this test form provides advantages over some 
of the older techniques. Wesman found that 
the distributions for the self-confidence factor 
on the Bernreuter were very different when 
85 students weré instructed to mark the test 
as salesmen might, and again later as librari- 


ans might, respond (18). Heron reported 


333 





334 


marked differences between adult job appli- 
cants and a control group in the strong tend- 
ency for prospective employees to present a 
favorable response pattern on a test of emo- 
tional maladjustment (9). Realistic propo- 
nents of the forced-choice approach recognize, 
however, that differential motivation in vari- 
ous life situations may affect all test forms, 
and these conditions must be continually 
evaluated by research. 

The personality test employed in this study 
was developed by Jones (10) as a nonpoliti- 
cal measure of authoritarianism, stemming 
from concepts employed in the California F 
Scale (1). Jones identified personality corre- 
lates of the F Scale by studying relationships 
with the Thurstone Temperament Schedule 
and the Guilford-Zimmerman Temperament 
Survey. By means of various analyses, 
he reasoned that the nonpolitical, generic 
traits of the authoritarian personality con- 
sisted of anxiety, hostility, rigidity, and de- 
pendency. Jones constructed a forced-choice 
questionnaire, the Pensacola Z Scale, reflect- 
ing these traits, and studied the effects of 
faking. A high score on this test character- 
izes the heteronomous (authoritarian) person- 
ality with tendencies toward rigidity, obedi- 
ence, and lack of in*‘iative; a low score 
characterizes the autonomous individual who 
is adaptable and reasonable, presumably more 
promising. for naval leadership. The mean 
score of naval aviation cadets instructed to 
“gouge” and “beat” the test did not differ 
significantly from the mean of those taking 
the test under standard instructions, although 
extreme scores less frequently occurred under 
the faking set (10). This contrasts some- 
what with recent findings demonstrating the 
susceptibility of the F Scale to faking as a 
function of different response sets (4, 15). 

In spite of apparent simplicity, the forced- 
choice questionnaire form contains subtle theo- 
retical assumptions (3, 17). In order to ex- 
amine one of these assumptions, we have used 
the set of the subjects as a parameter and 
have observed changes in score distributions 
and in concept-relevant correlational measures. 
Testees are sometimes inclined to present un- 
favorable (as well as favorable) pictures of 
themselves. This negative set has received 


Walter A. Kaess and Sam L. Witryol 


little research attention. Specifically, the pres- 
ent investigation was designed, then, to study 
comparative score distributions of the Pensa- 
cola Z Scale resulting from conventional in- 
structions and from instructions to fake posi- 
tively and negatively, respectively, and to ex- 
amine the consequences of these differential 
sets to score relationships with the Guilford- 
Zimmerman Temperament Survey and the 
Allport-Vernon-Lindzey Study of Values. 


Method 


The Pensacola Z Scale was administered to Intro- 
ductory Psychology students under three conditions 
of set: neutral, favorable, and unfavorable. Each of 
three groups of subjects was exposed to only one of 
the set conditions. Two items were eliminated from 
the scale because the sexual content seemed inappro- 
priate (Items 2 and 18). 

Neutral Set. From each pair of statements the 
subject was instructed in the customary manner to 
select the one which best described him. This con- 
dition was administered to 216 males and 306 fe- 
males. 

Favorable Set. In this condition, 48 males and 55 
females were instructed as follows: 

“This type of questionnaire is usually considered 
extremely difficult to fake so as to give a favorable 
impression. Let’s see how good a job you can do in 
faking it, that is, making yourself look as favorably 
as possible. 

“Imagine you are applying for a job that pays 
$10,000 a year, that you have all of the abilities to 
successfully handle the job, and that you would en- 
joy the job. The sole barrier between you and the 
job depends upon your score on this test. Quite 
apart from the true answers to the questions, you 
are to mark those answers which will give the most 
favorable impression. Attempt to beat the test; give 
the best answers regardless of whether you are tell- 
ing the truth about yourself. Remember to give the 
most favorable impression.” 

Unfavorable Set. Instructions for this condition to 
43 males and 51 females were: 

“This type of questionnaire is usually considered 
extremely difficult to fake so as to give an unfavor- 
able impression. Let’s see how good a job you can 
do in faking it, that is, in making yourself look as 
bad as possible. 

“Imagine that in making yourself look bad on this 
test you can avoid an unpleasant responsibility. The 
sole excuse for you to be trapped in this situation is 
the test score. You want a bad test score to get out 
of it. Quite apart from the true answers to the ques- 
tionnaire, you are to mark those answers which will 
give the most unfavorable impression. Attempt to 
beat the test. Give the answer you think will put 
you in the worst possible light, regardless of whether 





Faking on a Forced-Choice Authoritarian Scale 


you are telling the truth about yourself. Remember, 
give the most unfavorable impression.” 

Personality scores for these subjects on the G-Z 
Temperament Survey and the Study of Values were 
available for analysis as part of a programmatic re- 
search sequence already in progress (11, 19). Tests 
were administered in sections approximating 30 sub- 
jects in each. The sequence of test administration 
was: (a) Temperament Survey; (b) one week later, 
Pensacola Z with set conditions randomly assigned to 
sections; and (c) Study of Values four weeks after 
the Pensacola Z Scale. Simple analyses of variance 
of the set conditions for each sex were calculated on 
the scores obtained from the Temperament Survey 
and Study of Values in order to analyze possibilities 
of selection errors. The set groups did not differ sig- 
nificantly on these two variables; in no analysis was 
the null hypothesis rejected. 


Results 


Table 1 contains the Pensacola Z means 
and standard deviations of university students 
in this study compared to naval aviation 
cadets studied by Jones (10), under various 
conditions of set. This table combined with 
appropriate statistical tests of significance in- 
dicates the following results: 

1. The distributions, means, and standard 
deviations of university males and females 
under conventional conditions of test ad- 
ministration and under instructions to fake 
favorably are similar to those obtained by 
Jones with naval aviation cadets. This re- 
markable agreement is especially striking when 
one considers that our university population 
mainly consisted of freshmen and sophomores 
who took the scale with two items removed 
and who were compared to highly selected 
naval aviation cadets on a sensitive scale for 
authoritarianism. Although instructions to 
fake favorably in the university group re- 
sulted in statistically significant mean score 


335 


increases, the magnitudes were small and 
similar in direction to the Pensacola data. 
Moreover, the set to fake was not precisely 
the same in each of the comparative popula- 
tions. In general, these findings confirm the 
relative resistance of the forced-choice tech- 
nique to gross distribution changes as a con- 
sequence of faking favorably, and are con- 
sistent with the literature examples cited 
above. One wonders whether aviation cadet 
score distributions would be congruent with 
our college sample under instructions to fake 
unfavorably. The implications are provoca- 
tive as will be demonstrated in the analyses 
below. 

A double classification analysis of variance 
for sex and set yielded the following results 
for the Pensacola Z Scale: 

2. The F between sets is 79.55 (F 001 = 
6.91); the F between sexes is 5.83 (F .05 
3.84; F Ol = 6.64); and the set-sex inter- 
action is not significant. The major source 
of variance is very obviously between the sets. 
An unfavorable set lowers the mean score of 
the scale approximately 14 sigma units, thus 
raising serious questions about the negative 
fakability resistance power of this particular 
forced-choice scale. Our college population 
interpreted the unfavorable set as encouraging 
responses which resulted in lower scores in 
heteronomy. In other words, they felt that 
they made themselves look bad by marking 
responses which the test constructor had es- 
tablished as freedom from authoritarianism 
(10). Low scores in heteronomy reflect high 
positions in autonomy which, in turn, should 
be good indicators of naval leadership. Al- 
though this perception of items reflecting 
authoritarianism as desirable is much more 


Table 1 


Mean and SDs for University Students and Naval Aviation Cadets on the Pensacola Z 


Naval Aviation Cadets 


Set N Mean SD 


Neutral 


6.85 
5.69 


311 
Favorable 196 
Unfavorable 


35.31 
36.18 


University Students 
Females 
Mean SD 


6.22 
6.10 
5.83 


34.87 
37.15 
25.90 








Walter A. Kaess and Sam L. Witryol 


Table 2 


Correlations Between Pensacola Z and Selected Factors on the G-Z Temperament Survey 
and the Study of Values 


Sex 


Set for Z 


M 
k 
Total 


Neutral 


M 
I 


Favorable 


M 
F 
Total 


Unfavorable 


op. 

-p. 
TA 
TE 


Os 
ol 
A\scendance; ky 
Economic; A 


O1. 


motional Stability; O 
Aesthetic, 


pronounced in the scores of the university 
population under the unfavorable set, there is 
also a slight tendency in the direction of 
higher heteronomy (authoritarian) scores un- 
der the favorable This small change 
would not have practical significance if scores 
obtained under conditions of positive faking 
are the same as those obtained under neutral 
instructions. This latter contingency will be 
examined in the analysis of construct validity. 
It is of interest to note that the relative desir- 
ability of heteronomy items under the favor- 
able set is slightly more pronounced for fe- 
male than for male students (¢ = 2.78); the 
smallness of sex differences is consistent with 
findings reported on the F Scale in The Au- 
thoritarian Personality (1), but the direc- 
tions are reversed. 

Table 2 presents correlations between the 
Pensacola Z, under various conditions of set, 
with selected factors on the Guilford-Zimmer- 
man Temperament Survey and on the Study 
of Values. Only those factors which survived 
statistical tests of significance in at least one 
of the conditions of set are included in the 
table. The following results are suggested: 

1. In general the relationships established 
under the neutral set are compatible with the 
description of the autonomous personality pre- 
sented by Jones (10). Before studying the 


set. 


16 
a 
—_19** 


-,32 
06 
14 
O4 


Objectivity ; 


G-Z Surveyt Valuesft 


O 


_ _290** 
—_24%* 
.26** 


4 
07 
03 


12 
01 


M — Masculinity; F — Friendliness 


effects of faking, it is appropriate to examine 
construct validity (6) under the conventional 
test administration set. As has been indi- 
cated, Jones identified factors of anxiety, ri- 
gidity, and hostility emerging from exploring 
relationships of the California F Scale with 
objective personality inventories, and he added 
a fourth factor, dependency. These traits 
were the bases for his forced-choice test. The 
correlations obtained in the present study 
seem to confirm the hypotheses of Jones (10) 
and the California researchers (1), and are 
similar for both sexes. For example, a high 
score on heteronomy (authoritarianism) re- 
flects, from the temperament correlates in 
Table 2 (Factors A, E, and O), dependency, 
emotional instability, and sensitivity to criti- 
cism. Furthermore, the negative relationships 
with Aesthetic values and the positive rela- 
tionships with Economic values are in agree- 
ment with the conceptualizations of the Cali- 
fornia group (1, p. 228). These data support 
the construct validity of this particular forced- 
choice scale under standard instructions. 
Obtained correlations with the Masculinity 
and Friendliness scales appear to be different 
for males compared to females. It is likely 
that a Masculinity score does not have exactly 
the same meaning for males as for females. 
The data suggest that heteronomous females 





Faking on a Forced-Choice Authoritarian Scale 337 


are more effeminate, and autonomous females 
are more masculine in interests. The differ- 
ence between obtained correlations on Friend- 
liness for males and females is puzzling. Any 
reliable inference would be most complicated 
since the difference is not statistically signiti- 
cant, and only one of the obtained coefficients 
(for males) is sufficiently large to reject the 
null hypothesis. 

The analysis of the effects of set conditions 
upon the correlations in Table 2 presented 
special problems, because the N’s by sex ~: ’e 
relatively small under positive and negative 
faking conditions for sensitive tests of signifi- 
cance of differences or of the null hypothesis. 
None of the tests of sex differences between 
correlations reached significance. Standard 
scores by sex were calculated for faking con- 
ditions, and the sexes were combined to yield 
larger N’s. This standard score method was 
selected to control for sex differences on the 
variables. Correlations thus computed for 
males and females combined are presented in 
the rows marked “Total” in Table 2, and 
these results follow: 

2. Statistically significant relationships es- 
tablished between the Z Scale and various per- 
sonality factors under the conventional test 
administration set disappear under instruc 
tions to fake either positively or negatively. 
Construct validity still obtains under the neu- 
tral set for the combined (both sexes) total 
correlation coefficients based on standard 
scores, but it is not manifested under either 
of the sets to fake. None of the 14 correla 
tions computed by combining sexes reaches 
statistical significance for faking conditions; 
correlations calculated for each sex also fall 
short of significance. It seems that, even on 
a forced-choice questionnaire, the attitudes of 
the testees are critical determinants of con 
struct validity. 

Discussion 

Our results indicate that the forced-choice 
derivative of the California F Scale possesses 
substantial construct validity. The Pensacola 
Z Scale has strong implications for use as a 
research tool in studies of industrial training, 
supervision, and leadership. The political di- 


mension of authoritarianism, of questionable 


value in industrial psychology, has been re- 
moved, but Jones has reported a substantial 
correlation, .43, between the Z and the F in 
a cross-validation (10). Thus, industrial psy- 
chologists may draw upon the rapidly expand- 
ing literature in social psychology relating to 
the California F. A recent review of studies 
on the California F (16) cites evidence which 
relates authoritarianism to leadership. Infer- 
ences drawn are that authoritarians may lack 
social intelligence or social perception and 
may be relatively insensitive to significant 
personal cues for smooth functioning in power 
situations. 

Most previous fakability studies have ex- 
amined attempts to falsify responses in a so- 
cially desirable direction, as perceived by the 
Obviously life circumstances some- 
times impel the testee to attempt to present 
unfavorable impressions in order to avoid re- 
sponsibility, or because certain types of tests 
are perceived as threatening. Our findings 
strongly indicate that fakability of personality 
instruments should more frequently be  in- 
vestigated under a condition of negative, as 


testee, 


well as positive, set whenever there is reason 
to believe that subjects may be motivated to 
dissemble for the purpose of presenting an 
unfavorable test pattern. Consequences of 
negative falsification may be unexpected di- 
rection in score changes and modification of 
what the instrument is measuring 
by criteria correlates. 

Our objective in employing arbitrary ex- 
tremé set conditions of fakability as a pa 
rameter was an attempt to examine theoretical 
aspects of the forced-choice approach, rather 
than a conclusive demonstration of the sus- 
ceptibility of one particular scale. The naive 
application of the forced-choice form has some 
basic practical consequences. The key ques- 
tion is: Does the testee’s attitude toward the 
questionnaire affect the validity of his test 
score? Simple studies of distribution changes 
do not resolve this issue. 

Consider the consequences from attempting 
to fake unfavorably, as analyzed in the pres 
ent study. Under this condition, there was a 
very great shift in the score distribution from 
that obtained under conventional administra- 
tive procedures, and the direction of the shift 


as defined 





338 


was psychologically significant, but hardly 
predictable. The similarity of the distribu- 
tions of the Z Scale for the university and 
cadet populations suggests an intriguing ap- 
plication from the results of this investigation. 
Let us suppose two candidates are taking the 
Z Scale as one of a battery of selection tests 
for Naval Aviation Officer Training. Candi- 
date A wants very much to make a favorable 
impression in order to be selected. Candidate 
B would rather not be chosen for officer re- 
sponsibilities and tries to make an unfavor- 
able impression. If our findings can be gen- 
eralized to naval aviation cadets, there is a 
strong probability that Candidate B will tend 
to make the opposite impression from the one 
he is attempting. On the basis of this objec- 
tive test, Candidate B may be considered a 
successful, if reluctant, candidate! The Pen- 
sacola and Connecticut data are consistent in 
implying that Candidate A’s attempt to “beat 
the test”’ probably will not work, but to the 
extent that it does work, there will be a slight 
opposite effect from that intended. Further- 
more, faking in either direction “spoils” the 
Z Scale in the sense that statistically signifi- 
cant correlates disappear as demonstrated in 
Table 2. Construct validity under these cir- 
cumstances may be either lost or unpredict- 
ably rearranged. 

If construct validity is rearranged under 
extreme conditions of faking, then the sets 
employed may provide material for exploring 
consequences of social and personal pressures. 
The problem becomes not one of selecting a 
forced-choice instrument over a standard per- 
sonal report form, but it involves a series of 
empirical discriminations concerning the na- 
ture of instructions employed for a certain 
test as related to particular situations such as 
military screening, industrial selection, or vo- 
cational guidance (7, 12, 17). The relative 
resistance of the forced-choice approach to 
various degrees of response set should be ex- 
amined by individual score analysis for classes 
of tests in broad types of test settings. 

The rationale of the forced-choice technique 
raises further critical questions. If the items 
are paired for equivalence of preference value 
under standard conditions, does this equiva- 
lence still maintain under systematic response 


Walter A. Kaess and Sam L. Witryol 


biases to fake? Since the testee or the rater 
cannot be certain which of the choices operate 
in the direction of his intended bias, what is 
the final impact of a response set? The an- 
swers to these questions may well result in 
empirical shifts of the kind obtained in this 
investigation under the set to fake unfavor- 
ably. The subjects achieved scores opposite 
from their intentions and opposite from what 
testers would expect. This presents implica- 
tions for rating others, as well as for self rat- 
ings, when forced-choice instruments are ap- 
plied to selection in military or industrial 
settings where systematic response biases may 
exist. 


Summary 


The effects of faking both favorably and 
unfavorably upon a forced-choice scale of au- 
thoritarianism, the Pensacola Z Scale, were 
studied in this investigation. Distributions 
obtained from a university population under 
conventional instructions and under a set to 
“look good” were similar to those obtained 
by Jones with a naval aviation cadet sample. 
Construct validity, tested by correlating the 
Pensacola Z with the Guilford-Zimmerman 
Temperament Survey and the Allport-Vernon- 
Lindzey Study of Values, was consistent with 
the hypotheses of Jones and with those of the 
California researchers in The Authoritarian 
Personality. 

There were large distribution changes re- 
sulting from a set to “look bad” compared to 
the conventional set, but only small changes 
emerged from faking favorably. Further- 
more, the scores obtained by the subjects un- 
der the faking conditions, especially with the 
negative set, were opposite in direction from 
the intentions of the testees and the expecta- 
tions of the experimenters. Correlations es- 
tablished between the Z and the two instru- 
ments indicated above disappeared under both 
faking sets. 

The consequences of various sets for the 
forced-choice test form and for construct va- 
lidity were theoretically explored. The utility 
of this nonpolitical derivative of the Cali- 
fornia F Scale was recommended for leader- 
ship research in industrial, as well as military, 
settings on the basis of the present findings. 
The practical significance of set as a critical 





Faking on a Forced-Choice Authoritarian Scale 339 


10. Jones, M. B. Aspects of the autonomous per- 
sonality. Res. Rep. No. NM 001 108 109.04 
Pensacola, Fla.: U. S. Naval School of Avia- 
tion Medicine, June 15, 1955. 

. Kaess, W. A., & Witryol, S. L. Memory for 

References names and faces: A characteristic of social in- 

lige ’ Poue C » 

1. Adorno, T. W., Frenkel-Brunswik, E., Levinson, atl 4. ag, Peyehel, WEN, OO, 40 
D. J. & Sanford, R. N. The authoritarian . Longstaff, H. P., & Jurgensen, C. E. Fakability 
personality. New York: Harper, 1950. { the Jurgensen Classification Invent J 

. Anastasi, Anne. Psychological testing. New . ¢ -y a ee oo 

appl. Psychol., 1953, 37, 86-89. 


parameter, even for the forced-choice form, 
was demonstrated. 


Received February 28, 1957. 


York: Macmillan, 1954. 

. Baier, D. E. Reply to Travers’ “A critical re- 

view of the validity and rationale of the 

forced-choice technique.” Psychol. Bull., 1951, 

48, 421-434. 

. Cohn, T. S. The relation of the F Scale to a re- 

sponse set to answer positively. J. soc. Psy- 

chol., 1956, 44, 129-133. 

. Cronbach, L. J. Essentials of psychological test- 

ing. New York: Harper, 1949. 

. Cronbach, L. J., & Meehl, P. E. Construct 

validity in psychological tests. Psychol. Bull, 

1955, 52, 281-302. 

. Edwards, A. L. The relationship between the 
judged desirability of a trait and the prob- 
ability that the trait will be endorsed. J. appl. 
Psychol., 1953, 37, 90-93. 

. Gordon, L. V., & Stapleton, E. S. Fakability of 
a forced-choice personality test under realistic 
high school employment conditions. J., appl. 
Psychol., 1956, 40, 258-262. : 

. Heron, A. The effects of real-life motivation on 


questionnaire response. J. appl. Psychol., +956, 


40, 65-68. 


. Mais, R. D. Fakability of the Classification In- 


ventory scored for self confidence. J. appl 
Psychol., 1951, 35, 172-174. 


. Rusmore, J. T. Fakability of the Gordon Per- 


sonal Profile. J. appl. Psychol., 1956, 40, 175 
177. 


. Sundberg, N. D., & Bachelis, W. D. The fak 


ability of two measures of prejudice: the Cali 
fornia F Scale and Gough's Pr Scale. J. ab- 
norm. soc. Psychol., 1956, 52, 140-142. 


. Titus, H. E., & Hollander, E. P. The California 


F Scale in psychological research: 1950-1955 
Psychol, Bull, 1957, 54, 47-64. 


. Travers, R. M. W. A critical review of the va- 


lidity and rationale of the forced-choice tech- 
nique. Psychol. Bull., 1951, 48, 62-70 


. Wesman, A. G. Faking personality test scores in 


a simulated employment situation. J. appl 
Psychol,, 1952, 36, 112-113. 


. Witryol, S. L., & Kaess, W. A. Sex differences in 


social memory tasks. J. abnorm. soc. Psychol, 
1957, 54, 343-346. 





Journal of Applied Psychology 
Vol. 41, No. 5, 1957 


An Easier “Male” Mechanical Test for Use with Women ' 


William G. Mollenkopf 


The Procter & Gamble Company 


In the spring of 1954, four groups of Navy 
recruits--two male and two female—¢ach 
numbering about 100 individuals, participated 
in a study intended to compare the groups in 
mechanical ability. To two groups (one male 
and one female) the Breech Block Perform- 
ance Test was administered; the other two 
groups were given the Pictorial Assembly 
Test. 

Both tests were attempts to measure learn- 
ing ability in the mechanical area. The 
Breech Block Performance Test is a minia- 
ture training situation in which a sound film 
is first used to give instruction on how to as- 
semble the breechblock of a 40-mm. antiair- 
craft gun, and an assembly test is then used 
to evaluate what the subject has learned. 
There are five or more trials, each involving 
a film presentation and a testing on the per 
formance measure. From the resulting scores 
an over-all rate-of-learning score is derived. 
The Pictorial Assembly Test involves the 
showing of the same training film as is em- 
ployed with the Breech Block Test, but in- 
cludes a paper-and-pencil “assembly” test to 
evaluate the learning. The film is shown 
twice before the first test administration, and 
three times before the second. 

In addition to the above measures, all sub- 
jects were given Bennett’s Mechanical Com- 
prehension Test, Form W-1, a form designed 
especially for use with women. (Slight modifi 
cations were made in the instructions for ad- 
ministration from those given in the manual 
for Form W-1.) 

All subjects had previously taken the Navy 
Mechanical Test during their recruit training. 
This test is part of the Navy Basic Test Bat- 
tery and consists of 100 items, 50 on mechani- 
cal and electrical knowledge and 50 on me- 
chanical comprehension. The latter are simi- 


! This research was supported by funds provided 
by the Bureau of Naval Personnel through Contract 
Nonr-694(00) between the Office of Naval Research 
and Educational Testing Service. 


lar in type to those in Bennett's well-known 
Mechanical Comprehension Test. 


Problem 


After examining the data given in Table 1, 
modified from Allison’s report on this study 
(1), Dr. John T. Dailey of the Bureau of 
Naval Personnel suggested to the writer a pos- 
sible approach to the construction of a me- 
chanical ability test that might be more ap- 
propriate for use with enlisted women in the 
Navy. The main point of the approach is 
that Navy mechanical tasks ordinarily are 
regarded as male activities, and that, to be 
most valid, a test for enlisted women in the 
Navy should therefore involve items center- 
ing around “male” activities and devices 
not egg beaters and the like—with the im- 
portant provision that these items be appro- 
priate in difficulty for women. The present 
report presents the results of an attempt to 
“construct” a test following this suggestion. 


Method 


Several steps were involved in checking the effec- 
tiveness of an easier “male” mechanical! ability test 
for women: First, the selection of items from the 
existing Basic Test Battery Mechanical Test, accord 
ing to item-analysis data; second, the rescoring of 
the Mechanical Test answer sheets for the two sam 
ples of female recruits; and, finally, the computation 
of correlations of the resulting scores with scores on 
the Breech Block Performance and Pictorial Assem- 
bly Tests—these latter scores, especially those for the 
Breech Block, here being criterion 
measures. 

Use of item-analysis data. From the spring of 1952 
until the fall of 1953, the classification office of the 
Recruit Training Command, U. S. Naval Training 
Center, Bainbridge, Maryland, provided Educational 
Testing Service with the answer sheets for enlisted 
women who had taken the tests comprising the Basic 
Test Battery. In the fall of 1953, a representative 
sample of 370 Mechanical Test answer sheets was 
selected, and item-analyses were carried out, using 
Fan’s modification of Flanagan’s technique (2). The 
method yields both an index of item difficulty and 
an estimate of the item-test correlation. 

As mentioned earlier, the Mechanical Test consists 


considered as 


340 





Easier “Male” Mechanical Test for Women 


Table 1 


Means, Standard Deviations, and Intercorrelations of Navy Mechanical Test Scores and Scores on the 
Breech Block Performance and Pictorial Assembly Tests 


1. 


Group A: 


. Navy Mechanical Test 
. Bennett Mech. Comp. W-1 65 
. Breech Block Perf. Test 39 


Group B: 


. Navy Mechanical Test 
Bennett Mech. Comp. W-1 
Breech Block Perf. Test 59 


Group C; 


. Navy Mechanical Test 
. Bennett Mech. Comp. W-1 
. Pictorial Assembly Test 29 


Group D 


. Navy Mechanical Test 
. Bennett Mech. Comp. W-1 
. Pictorial Assembly Test 


of two types of items. Separate analyses were there 
fore carried out for Parts I and II. A two-way 
scatter plot was then made for the items in each 
part, the item difficulty index being plotted along the 
abscissa and item-test correlation along the ordinate 
Pairs of items weré then selected having closely 
matching difficulty and internal consistency indices 
Only those items for which the difficulty index was 
at least 30 and the item-test correlation at least .20 
were considered. Members of the pairs were then 
assigned at random to Scoring Key A or B. Thir 
teen item pairs in each part met the criteria laid 
down for selection, and so there were 26 items in 
Score A and another 26 in Score B. 

The Basic Test Battery Mechanical Test answer 
sheets for the two samples of WAVE recruits were 
next rescored with these new keys. Three 
were thus obtained: A, B, and Total. To get a good 
appraisal of the “new” test, both its reliability and 
its validity were needed. The A and B scores served 
as the basis for securing an estimate of reliability, 
whereas the total scores were correlated against the 
criterion of the Breech Block scores for an estimate 
of validity. 


scores 


Results 


Reliability. For the estimation of the reli- 
ability of the easier “male” form, the two 
samples of WAVE recruits were combined. 


109 Female Recruits 


65 


46 ¢ 


129 Male Recruits 


Oo 


54 


109 Female Recruits 


42 


38 


125 Male Recruits 


The correlation of the A and B scores was 
found to be .49, and this, when stepped up 
by the Spearman-Brown correction, yielded a 
reliability estimate of .66. (That the A and 
B parts were evenly matched is shown by the 
facts that the means were respectively 13.6 
and 13.8 and that the two standard deviations 
were 3.4 and 3.2.) ‘ 

In view of the length of the experimental 
test (52 items), the obtained reliability of .66 
appears quite satisfactory. (A_ full-length 
test composed of similar items might be ex- 
pected to have a reliability of about’.80 when 
used with women.) 

Validity. The Basic Test Battery Mechani- 
cal Test is intended as a means of predicting 
success on mechanical-motor tasks involved in 
various Navy enlisted assignments. In at- 
tempting to appraise the validity of this test 
and that of the easier “male” form for use 
with enlisted women, a criterion was needed. 
A follow-up of the recruits would have been 
utterly impracticable; the groups tested would 
disperse widely after recruit training with 





William G. Mollenkopf 


Table 2 


Comparison of Validities of Entire B.T.B. Mechanical 
Test and the Easier ‘“Male’”’ Form for Women 


Test 


Items Mean SD Validity 


Group A: 109 Female Recruits 
(Breech Block Score as Criterion) 


B.T.B. Mechanical 100 
Easier ‘‘Male”’ Form 52 


37.8 5.4 
27.8 5.5 


Group C: 109 Female Recruits 
(Pictorial Assembly Score as Criterion) 


B.T.B. Mechanical 100 
Easier “Male” Form 52 


36.2 59 
27.0 5.9 


few going to any one type of assignment at 
any particular station. 

This situation led to the consideration of 
the Breech Block scores as criterion measures. 
These scores represent the learning of a me- 
chanical-motor task under carefully controlled 
conditions. While successful performance of 
this task does not, ef course, involve all of 
the mech. nical-motor skills that doubtless are 
important in the complete range of Navy 
billets, nevertheless carrying out this task 
does in fact require the employment of a 
number of important skills. Furthermore, the 
Breech Block Test possessed the distinct ad- 
vantage of having been found to be suitable 
for use with women. 

A less effective case can be presented for 
the similar use of the Pictorial Assembly Test 
scores. Involving as it does a “pictorial” 
rather than an actual assembly, some question 
may be raised as to how justifiable would be 
the treatment of scores from this test as 
criterion ‘(as distinguished from predictive) 
measures. 

Table 2 presents the correlations of the 
easier “male” form of the Mechanical Test 
with the Breech Block Performance Test and 
the Pictorial Assembly Test. While scores on 
both of the latter may be considered as con- 
current criterion measures, because of the 
points just made regarding the nature of the 
tasks these tests present, attention is focused 
on the upper part of Table 2. Note that the 
easier “male” form yields a validity of .47 as 


compared with that of .39 for the original 
test, which had almost twice as many items. 
If a full-length (100-item) easier “male” form 
were to be produced, it is estimated that it 
would have a validity of .52 against this same 
criterion. These points bear out Dr. Dailey’s 
hunch very nicely: the easier “male” items do 
apparently yield a more valid predictor of me- 
chanical-motor learning ability for use with 
women. 


Discussion 


Ordinarily, when a group of relatively ho- 
mogeneous test items is selected from a larger 
number, the validity of the smaller group 
tends to be somewhat less than that for the 
original larger pool. That a higher validity 
was found for the selected items in the pres- 
ent instance therefore has implications regard- 
ing the characteristics of the total group so 
far as women are concerned. 

Before proceeding on this point, however, it 
may be well to point out that the item keying 
was not being altered; only the collection of 
items was changed. Nor were items chosen 
because of high correlations with the criterion. 
Rather, the prime considerations were the diffi- 
culty of the item and the item-test correlation. 

Thus it seems fair to interpret the findings 
as indicating that the easier “male” items 
were better predictors of mechanical learning 
ability because they were more appropriate 
for the subjects being tested—namely, female 
recruits. Apparently the items that were too 
hard for the WAVES not only added nothing, 
but actually detracted from the validity of the 
test. 

These results are pertinent to the question 
of the appropriateness of tests for use with 
enlisted women in the Navy. So far as the 
Mechanical Test is concerned, the findings in- 
dicate that the presently used test is not so 
effective for use with enlisted women as would 
be one in which the items were of a more ap- 
propriate ley.! of difficulty, though still con- 
cerned with “male” activities in the mechani- 
cal area. 


Summary 


Examination of mechanical test data for 
male and female recruits suggested that a 





Easier “Male” Mechanical Test for Women 


more valid test for use with enlisted women 
in the Navy might be constructed with items 
concerned with “male” mechanical activities, 
but at a difficulty level appropriate for female 
recruits. 

By selecting on.the basis of item charac- 
teristics, 52 items were chosen from the 100 
in the Basic Test Battery Mechanical Test. 
Using as a criterion the scores from the Breech 
Block Performance Test (a measure of ability 
to learn mechanical-motor skills), the validity 
of the new “easier” 52-item test was found to 


343 


be .47 as compared with .39 for the original 
100-item test. 


Received March 11, 1957. 


References 


1. Allison, R. B., Jr. Mechanical ability: compari 
sons of test scores for Naval enlisted men and 
women. Bureau of Naval Personnel Technical 
Bulletin 56-5. Princeton, N. J.: Educational 
Testing Service, May 1956 

2. Fan, C. T. Item Analysis Table. Princeton, N. J.: 
Educational Testing Service, 1952 














REVUE DE PSYCHOLOGIE APPLIQUEE 


PUBLICATION TRIMESTRIELLE 


Directeurs : D' P. PICHOT et P. RENNES 


Cette Revue s'adresse aussi bien aux.cliniciens (psychologues ou psychiatres), 
qu’aux psychotechniciens (orienteurs, psychologues de la profession). 

Deux rubriques sont orientées vers l’application : Techniques et Méthodes 
en psychologie de la profession ct Techniques et Méthodes en psychologie 
clinique. Ces rubriques ont pour but d’exposer sous une forme précise et con- 
créte les techniques fondamentales, d'éclairer des points douteux, de présenter, 
méme sous forme d’aide-mémoire, les méthodes pratiques de gonduite des appli- 
cations. Elles sont complétées par des Revues générales qui permettent de faire 
le point des recherches dans des domaines intéressant directement l’application. 
Dans la rubrique Travaux originaux prennent place des études d’ordre plus 
général. 

Enfin les autres rubriques Chroniques et Documentation et Analyses don- 
nent, tant sur le plan technique que sur le plan professionnel, un tableau de la 
vie quotidienne en psychologie appliquée. 


Rédaction et Administration : 15, rue Henri-Heine - PARIS (XVI*) 
C. C. PARIS 5851-62 


ABONNEMENTS: 1 an, France: 1.000 francs - Etranger: 1.300 francs 
NUMERO SPECIMEN SUR DEMANDE 

















Outstanding 





MecGRAW- HILL 





cooks 


HUMAN RELATIONS IN BUSINESS 
By Kerry Davis, Indiana University. 576 pages, $6.50 


Here is the first text to cover the full range of management’s human relations activities and problems, 
presenting the fundamentals of employee relations to students and businessmen. It discusses the prob- 
lems that occur when people work together in organized group effort. Though oriented toward business, 
the principles and ideas are just as applicable to government, education, or other work groups. Cases 
illustrate the problems. 


OCCUPATIONAL INFORMATION: Where to Get 
It and How to Use It in Counseling and in Teaching 
By Rosert Hoprock, New York University. 531 pages, $6.75 


A revision of the author's successful Group Guidance combined with an entirely new section on the 
sources and uses of occupational information. It is designed for use in courses training counselors for 
schools, colleges, and other organizations; and for use by teachers, psychologists, social workers, 
personnel directors and others to whom people turn when they want facts about jobs. It is the first book 
that tells where to get occupational information; and how to use it in counseling and teaching. 


HOW TO STUDY 


By CuiFrorp T. MORGAN and James Deese, both of Johns Hopkins Uni- 
versity. 127 pages, $1.50 


A brief, practical, self-help book for college students and for those planning to attend. In an informa! 
style, with appealing illustrations, it surveys all major aspects of studying and learning. Special chapters 
are included on studying mathematics and foreign languages, on taking exams, taking notes, and on 
where the student can get help. The authors provide specific directions and techniques, making it 
possible for the student to use the book without help and thus learn more in less time and improve 
his grades. These techniques have been successfully tested in many universities over the past 30 years. 


GETTING THE MOST OUT OF COLLEGE 


By Marearet E. Bennett, Pasadena City College, with the collaboration 
of Molly Lewin. 219 pages, $3.95 (paper edition, $2.75) 


An orientation text especially written to help college freshmen make the most of their opportunities in 
their relations to the college campus and community, and the faculty and student body. Discusses such 
topics as: participation in college life; development of effective leadership techniques; and personal 
adjustment and development. The style is nontechnical, simple, and informal. 








Send for copies on approval 


McGraw-Hill 
BOOK COMPANY, INC. 























330 West 42nd Street New York 36, N.Y. 














q 











Two Important Contributions... . . 


3 
‘ 
: 
: 


Z 
= 
oS 
S 
& 
: 
zZ 
= 
=7 
O 
= 
© 
= 


in the field of Industrial Psychology 





PSYCHOLOGY IN INDUSTRY _ seEconp EpiTIon 
by NORMAN R. F. MAIER 


“A well-written and up-tce-date text for the first course in Industrial 


Psychology.” 
BRENDAN A. MAHER 
Northwestern University 


“In this ‘applied area,’ I find Maier’s book to be a mine of applicable 
illustrations and techniques. It is an exciting source of new ideas for 
the teacher of Industrial Psychology.” 

NEIL A. CARRIER 

University of Colorado 


“One of the most helpful books available for illustrating the application 


of psychological science to the industrial situation.” 


RENATO TAGIURI 
Harvard University 


“I consider the book an excellent and exceptionally stimulating and 
readable presentation of the material in Industrial Psychology.” 
ARTHUR A. WITKIN 
Queens College, New York 


in the field of Group Leadership 








GROUP-CENTERED LEADERSHIP 
by THOMAS GORDON 


“This is definitely a major contribution to this field, interestingly 
presented in clear, understandable language. It points the way to a 
wider use of the experimental approach in studying group development 
and admirably points up the fruitfulness of this procedure when capably 


guided. .. 
WILBUR C. BATCHELOR 
Ohio State University 


“This volume, which describes methods of making the human use of 
human beings a more democratic process, presents an interesting and 
thought-provoking point of view. Rather than dealing with leadership 
as a situation in which one person commands and others obey, the author 
describes the development of objectives mutually acceptable to mem- 
bers of the group, and their participation as co-workers in reaching 


these goals.” 
MANAGEMENT RECORD 
December, 1955 


“This book is clearly and interestingly written, original and challenging 
in its conception. ...” 
DEAN C. BARNLUND 
Northwestern University 
THE QUARTERLY JOURNAL OF SPEECH 
February, 1956 











